In-place migration

If you are looking for an efficient way to migrate Hive tables to Iceberg, you came to the right place. An overview of using Cloudera Data Warehouse prepares you to convert Apache Hive external tables to Apache Iceberg with no downtime. You learn the advantages of moving Hive tables to Iceberg for implementing an open lakehouse.

You can accelerate data ingestion, curation, and consumption at petabyte scale by migrating Hive tables to Iceberg.

Hive-to-Iceberg table migration is fast because the migration does not regenerate data files. Just metadata is rewritten. Cloudera Data Warehouse provides a simple API for migrating Hive tables to Iceberg to simplify adoption of Iceberg. An ALTER TABLE command sets the storage handler to convert the data without regeneration or remigration.

You can read and write files in following types of files from Hive:

  • Avro
  • Optimized Row Columnar (ORC)
  • Parquet

Limitations

  • Due to a bug in Apache Iceberg it is not possible to migrate tables that are partitioned by string columns having a partition value that contains the forward slash ‘/’ character. For more information see https://github.com/apache/iceberg/issues/7612
  • Some column types supported by Hive tables are not supported by Iceberg tables, such as tinyint or smallint. Tables with such columns are not migrated to Iceberg.
  • The original Hive table must be in AVRO, ORC, or Parquet format.