Migrating a Hive table to Iceberg

You see how to use a simple ALTER TABLE statement to migrate an external Hive table to an Iceberg table. You see how to specify Iceberg classes in table properties to configure table input and output.

When you migrate an external Hive table to Iceberg, Hive makes the following changes:

  • Converts the storage_handler, serde, inputformat and outputformat properties of the table in HMS to use the Iceberg specific classes.

  • Reads the footers of the existing data files and generates the necessary Iceberg metadata files based on the footers.

  • Commits all the data files to the Iceberg table in a single commit.

You must meet the prerequisites for using Iceberg mentioned earlier.

  1. Log in to the CDP web interface and navigate to the Data Warehouse service.
  2. In the Data Warehouse service, navigate to the Overview page, locate your Hive Virtual Warehouse, and click Hue.
  3. Enter a query to use a database.
    For example:
    USE mydb;
  4. Enter a Hive query to migrate an existing external Hive table to an Iceberg v2 table.
    For example:
    ALTER TABLE tbl 
    SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler',
     'format-version' = '2');
    Do not drop the table as explained above unless you set the 'external.table.purge' table property to false.
  5. Click to run the query.