Create table feature

You use CREATE TABLE from Impala to create an external table in Iceberg. You also learn about partitioning.

From Impala, by default, Iceberg tables you create are v1. To create an Iceberg v2 table from Hive or Impala, you need to set a table property as follows:'format-version' = '2'

Iceberg table creation from Impala

From Impala, CREATE TABLE is recommended to create an Iceberg table in CDP. Impala creates the Iceberg table metadata in the metastore and also initializes the actual Iceberg table data in the object store.

For more information, see the Apache documentation, "Using Impala with Iceberg Tables".

Metadata storage of Iceberg tables

When you create an Iceberg table using CREATE TABLE in Impala, HiveCatalog creates an HMS table and also stores some metadata about the table on your object store, such as S3. Creating an Iceberg table generates a metadata.json file, but not a snapshot. In the metadata.json, the snapshot-id of a new table is -1. Inserting, deleting, or updating table data generates a snapshot. The Iceberg metadata files and data files are stored in the table directory under the warehouse folder. Any optional partition data is converted into Iceberg partitions instead of creating partitions in the Hive Metastore, thereby removing the bottleneck.

To create an Iceberg table from Impala, you associate the Iceberg storage handler with the table using STORED AS ICEBERG or STORED BY ICEBERG.

Supported file formats

Impala supports writing Iceberg tables in only Parquet format. Impala does not support defining both file format and storage engine. You can read Iceberg tables in the following formats: Parquet, Avro, ORC

Impala syntax

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name	  
  [(col_name data_type, ... )]
  [PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
   STORED {AS | BY} ICEBERG
  [TBLPROPERTIES (property_name=property_value, ...)]

Impala examples

CREATE TABLE ice_7 (i INT, t TIMESTAMP, j BIGINT) STORED BY ICEBERG; //creates only the schema
CREATE TABLE ice_8 (i INT, t TIMESTAMP) PARTITIONED BY (j BIGINT) STORED BY ICEBERG; //creates schema and initializes data
CREATE TABLE ice_v2 (i INT, t TIMESTAMP) PARTITIONED BY (j BIGINT) STORED BY ICEBERG TBLPROPERTIES ('format-version' = '2'); //creates a v2 table