Column design
A Kudu table consists of one or more columns, each with a defined type. Columns that are not part of the primary key may be nullable.
-
boolean
-
8-bit signed integer
-
16-bit signed integer
-
32-bit signed integer
-
64-bit signed integer
-
date (32-bit days since the Unix epoch)
-
unixtime_micros (64-bit microseconds since the Unix epoch)
-
single-precision (32-bit) IEEE-754 floating-point number
-
double-precision (64-bit) IEEE-754 floating-point number
-
decimal
-
varchar
-
UTF-8 encoded string (up to 64KB uncompressed)
-
binary (up to 64KB uncompressed)
-
VARCHAR
type with configurable maximum length (up to 64KB uncompressed)
Kudu takes advantage of strongly-typed columns and a columnar on-disk storage format to provide efficient encoding and serialization. To make the most of these features, columns should be specified as the appropriate type, rather than simulating a 'schemaless' table using string or binary columns for data which could otherwise be structured. In addition to encoding, Kudu allows compression to be specified on a per-column basis.