Create table feature

You use CREATE TABLE or CREATE EXTERNAL TABLE to create an external table in Iceberg. You learn how the subtle differences in these features for creating Iceberg tables from Hive and Impala. You also learn about partitioning.

Hive and Impala handle external table creation a little differently, and that extends to creating tables in Iceberg.

Iceberg table creation from Hive

From Hive, CREATE EXTERNAL TABLE is recommended to create an Iceberg table.

When you use the EXTERNAL keyword to create the Iceberg table, by default only the schema is dropped when you drop the table. The actual data is not purged. Conversely, if you do not use EXTERNAL, by default the schema and actual data is purged. You can override the default behavior. For more information, see the Drop table feature.

From Hive, you can create a table that reuses existing metadata by setting the metadata_location table property to the object store path to the metadata. The operation skips generation of new metadata and re-registers the existing metadata.

Iceberg table creation from Impala

From Impala, CREATE TABLE is recommended to create an Iceberg table.

When you do not use the EXTERNAL keyword, Impala creates the Iceberg table metadata in the metastore and also initializes the actual Iceberg table data in the object store.

When you use the EXTERNAL keyword to create the Iceberg table, Impala does not initialize the Iceberg table in the object store. Impala creates only the metadata in the Hive Metastore. The difference between Hive and Impala with regard to creating an Iceberg table is related to Impala compatibility with Kudu, HBase, and other tables. For more information, see the Apache documentation, "Using Impala with Iceberg Tables".

Metadata storage of Iceberg tables

When you create an Iceberg table from using CREATE EXTERNAL TABLE in Hive or using CREATE TABLE in Impala, HiveCatalog creates an HMS table and also stores some metadata about the table on your object store, such as S3. For example, you can find the Iceberg snapshot file as part of that metadata. The Iceberg metadata files and data files are stored in the table directory under the warehouse folder. Any optional partition data is converted into Iceberg partitions instead of creating partitions in the Hive Metastore, thereby removing the bottleneck.

To create an Iceberg table from Hive or from Impala, you associate the Iceberg storage handler with the table using one of the following clauses, respectively:
  • Hive: STORED BY ICEBERG
  • Impala: STORED AS ICEBERG

Supported file formats

You can create Iceberg tables in the following formats:
  • Hive: Parquet (default), Avro, ORC
  • Impala: Parquet format

Hive can read Iceberg tables in Parquet, Avro, and ORC. Impala can read Iceberg tables in ORC or Parquet.

Hive syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name	  
  [(col_name data_type, ... )]
  [PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
  [STORED AS file_format]
   STORED BY ICEBERG
  [TBLPROPERTIES (property_name=property_value, ...)] 

Impala syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name	  
  [(col_name data_type, ... )]
  [PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
   STORED AS ICEBERG
  [TBLPROPERTIES (property_name=property_value, ...)]

Hive examples

CREATE EXTERNAL TABLE ice_1 (i INT, t TIMESTAMP, j BIGINT) STORED BY ICEBERG;
CREATE EXTERNAL TABLE ice_2 (i INT, t TIMESTAMP) PARTITIONED BY (j BIGINT) STORED BY ICEBERG;
CREATE EXTERNAL TABLE ice_4 (i int) STORED BY ICEBERG STORED AS ORC;
CREATE EXTERNAL TABLE ice_5 (i int) STORED BY ICEBERG ('metadata_location'='s3a://bucketName/ice_table/metadata/v1.metadata.json')

Impala examples

CREATE EXTERNAL TABLE ice_1 (i INT, t TIMESTAMP, j BIGINT) STORED AS ICEBERG; //creates only the schema
CREATE TABLE ice_2 (i INT, t TIMESTAMP) PARTITIONED BY (j BIGINT) STORED AS ICEBERG; //creates schema and initializes data