Migrate Hive table to Iceberg using Impala
CDP supports Hive table migration from Impala to Iceberg tables using ALTER TABLE.
Requirements
- The original table is an EXTERNAL table from the Impala perspective. (‘EXTERNAL’ table property value is true)
- The original table is a non-ACID table.
- You have “ALL” privileges on the database containing the table.
Limitations
- Due to a bug in Apache Iceberg it is not possible to migrate tables that are partitioned by string columns having a partition value that contains the forward slash ‘/’ character. For more information see https://github.com/apache/iceberg/issues/7612
- Some column types supported by Hive tables are not supported by Iceberg tables, such as tinyint or smallint. Tables with such columns are not migrated to Iceberg.
- The original Hive table must be in AVRO, ORC, or Parquet format.
In-place table migration process
In-place table migration saves time generating Iceberg tables. There is no need to regenerate data files. Only metadata, which points to source data files, is regenerated. As a result of an in-place table migration a new Iceberg table is created using the name and the location of the old Hive table. The old Hive table is dropped during the process.
Impala syntax example
To convert a Hive table to an Iceberg V1 table from Impala, use the following syntax:
ALTER TABLE table_name CONVERT TO ICEBERG;
To convert a Hive table to an Iceberg V2 table from Impala you must run two queries. Use the following syntax:
ALTER TABLE table_name CONVERT TO ICEBERG
ALTER TABLE table_name SET TBLPROPERTIES ('format-version' = '2'
...)