Migrate Hive table to Iceberg using Impala

CDP supports Hive table migration from Impala to Iceberg tables using ALTER TABLE.

Requirements

  • The original table is an EXTERNAL table from the Impala perspective. (‘EXTERNAL’ table property value is true)
  • The original table is a non-ACID table.
  • You have “ALL” privileges on the database containing the table.

Limitations

  • Due to a bug in Apache Iceberg it is not possible to migrate tables that are partitioned by string columns having a partition value that contains the forward slash ‘/’ character. For more information see https://github.com/apache/iceberg/issues/7612
  • Some column types supported by Hive tables are not supported by Iceberg tables, such as tinyint or smallint. Tables with such columns are not migrated to Iceberg.
  • The original Hive table must be in AVRO, ORC, or Parquet format.

In-place table migration process

In-place table migration saves time generating Iceberg tables. There is no need to regenerate data files. Only metadata, which points to source data files, is regenerated. As a result of an in-place table migration a new Iceberg table is created using the name and the location of the old Hive table. The old Hive table is dropped during the process.

Impala syntax example

To convert a Hive table to an Iceberg V1 table from Impala, use the following syntax:

ALTER TABLE table_name CONVERT TO ICEBERG;

To convert a Hive table to an Iceberg V2 table from Impala you must run two queries. Use the following syntax:

ALTER TABLE table_name CONVERT TO ICEBERG

ALTER TABLE table_name SET TBLPROPERTIES ('format-version' = '2'
...)