Schema inference feature

From Hive or Impala, you can base a new Iceberg table on a schema in a Parquet file. You see a difference in the Hive and Impala syntax and examples.

From Hive, you must use FILE in the CREATE TABLE LIKE ... statement. From Impala, you must omit FILE in the CREATE TABLE LIKE … statement. The column definitions in the Iceberg table are inferred from the Parquet data file when you create a table like Parquet from Hive or Impala. Set the following table property for creating the table:

hive.parquet.infer.binary.as = <value>

Where <value> is binary (the default) or string.

This property determines the interpretation of the unannotated Parquet binary type. Some systems expect binary to be interpreted as string.

Hive syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE FILE PARQUET 'object_storage_path_of_parquet_file' 	  
[PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
[STORED AS file_format]
STORED BY ICEBERG
[TBLPROPERTIES (property_name=property_value, ...)]

Impala syntax

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name LIKE PARQUET 'object_storage_path_of_parquet_file' 	  
[PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
STORED (AS | BY) ICEBERG
[TBLPROPERTIES (property_name=property_value, ...)]

Hive example

CREATE TABLE ctlf_table LIKE FILE PARQUET 's3a://testbucket/files/schema.parq'
STORED BY ICEBERG;

Impala example

CREATE TABLE ctlf_table LIKE PARQUET 's3a://testbucket/files/schema.parq'
STORED BY ICEBERG;