Column design

A Kudu table consists of one or more columns, each with a defined type. Columns that are not part of the primary key may be nullable.

Supported column types include:
  • boolean

  • 8-bit signed integer

  • 16-bit signed integer

  • 32-bit signed integer

  • 64-bit signed integer

  • date (32-bit days since the Unix epoch)

  • unixtime_micros (64-bit microseconds since the Unix epoch)

  • single-precision (32-bit) IEEE-754 floating-point number

  • double-precision (64-bit) IEEE-754 floating-point number

  • decimal

  • varchar

  • UTF-8 encoded string (up to 64KB uncompressed)

  • binary (up to 64KB uncompressed)

  • VARCHAR type with configurable maximum length (up to 64KB uncompressed)

Kudu takes advantage of strongly-typed columns and a columnar on-disk storage format to provide efficient encoding and serialization. To make the most of these features, columns should be specified as the appropriate type, rather than simulating a 'schemaless' table using string or binary columns for data which could otherwise be structured. In addition to encoding, Kudu allows compression to be specified on a per-column basis.