In-place migration from Hive

If you are looking for an efficient way to Hive migrate Hive tables to Iceberg, you came to the right place. An overview of using Cloudera Data Warehouse (CDW) prepares you to convert Apache Hive external tables to Apache Iceberg with no downtime. You learn the advantages of moving Hive tables to Iceberg for implementing an open lakehouse.

You can accelerate data ingestion, curation, and consumption at petabyte scale by migrating from Hive to Iceberg.

Hive-to-Iceberg table migration is fast because the migration does not regenerate data files. Just metadata is rewritten. CDW provides a simple API for migrating Hive tables to Iceberg to simplify adoption of Iceberg. An ALTER TABLE command sets the storage handler to convert the data without regeneration or remigration.

Using CDW, you can take advantage of schema evolution and partition evolution features. For more information about using Iceberg in CDW, see "Using Apache Iceberg". Newly generated metadata is built on top of existing Hive external tables and points to source data files as shown in the following diagram:

Migration using ALTER TABLE (recommended)

You can also run CREATE TABLE AS SELECT to migrate a Hive table to Iceberg; however, this approach reads data from the source and regenerates data in Iceberg:

You can read and write files in following types of files from Hive:

  • Avro
  • Optimized Row Columnar (ORC)
  • Parquet