Create a new Kudu table from Impala

Creating a new table in Kudu from Impala is similar to mapping an existing Kudu table to an Impala table, except that you need to specify the schema and partitioning information yourself. Use the examples in this section as a guideline. Impala first creates the table, then creates the mapping.

In the CREATE TABLE statement, the columns that comprise the primary key must be listed first. Additionally, primary key columns are implicitly considered NOT NULL.

When creating a new table in Kudu, you must define a partition schema to pre-split your table. The best partition schema to use depends upon the structure of your data and your data access patterns. The goal is to maximize parallelism and use all your tablet servers evenly.

The following CREATE TABLE example distributes the table into 16 partitions by hashing the id column, for simplicity.

CREATE TABLE my_first_table
(
  id BIGINT,
  name STRING,
  PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU;

By default, Kudu tables created through Impala use a tablet replication factor of 3. To specify the replication factor for a Kudu table, add a TBLPROPERTIES clause to the CREATE TABLE statement as shown below where n is the replication factor you want to use:

TBLPROPERTIES ('kudu.num_tablet_replicas' = 'n')

A replication factor must be an odd number.

Changing the kudu.num_tablet_replicas table property using the ALTER TABLE currently has no effect.

The Impala SQL Reference CREATE TABLE topic has more details and examples.

The mapping between Kudu and Impala data types is summarized in the the following table:
Kudu type Impala type
BOOL (boolean) BOOLEAN
INT8 (8-bit signed integer) TINYINT
INT16 (16-bit signed integer) SMALLINT
INT32 (32-bit signed integer) INT
INT64 (64-bit signed integer) BIGINT
DATE (32-bit days since the Unix epoch) NA (Not supported by Impala)
UNIXTIME_MICROS (64-bit microseconds Unix epoch) TIMESTAMP (For details, see Notes.)
FLOAT (single-precision 32-bit IEEE-754 floating-point) FLOAT
DOUBLE (double-precision 64-bit IEEE-754 floating-point) DOUBLE
DECIMAL DECIMAL
VARCHAR VARCHAR
STRING STRING
BINARY BINARY (Not yet supported. For details, see IMPALA-5323.)