Schema inference feature
From Hive or Impala, you can base a new Iceberg table on a schema in a Parquet file. You see a difference in the Hive and Impala syntax and examples.
From Hive, you must use FILE in the CREATE TABLE LIKE ... statement. From Impala, you
must omit FILE in the CREATE TABLE LIKE … statement. The column definitions in the
Iceberg table are inferred from the Parquet data file when you create a table like
Parquet from Hive or Impala. Set the following table property for creating the table:
hive.parquet.infer.binary.as = <value>
Where <value> is
binary (the default) or string.This property determines the interpretation of the unannotated Parquet binary type. Some systems expect binary to be interpreted as string.
Hive syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE FILE PARQUET 'object_storage_path_of_parquet_file'
[PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
[STORED AS file_format]
STORED BY ICEBERG
[TBLPROPERTIES (property_name=property_value, ...)]
Impala syntax
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name LIKE PARQUET 'object_storage_path_of_parquet_file'
[PARTITIONED BY [SPEC]([col_name][, spec(value)][, spec(value)]...)]]
STORED (AS | BY) ICEBERG
[TBLPROPERTIES (property_name=property_value, ...)]
Hive example
CREATE TABLE ctlf_table LIKE FILE PARQUET 'hdfs://files/schema.parq'
STORED BY ICEBERG;
Impala example
CREATE TABLE ctlf_table LIKE PARQUET 'hdfs://files/schema.parq'
STORED BY ICEBERG;