In-place migration from Spark

In Cloudera Data Engineering (CDE), you can use Spark SQL to migrate Hive tables to Iceberg. You can convert Apache Hive external tables to Apache Iceberg with no downtime. Cloudera recommends moving Hive tables to Iceberg for implementing an open lakehouse.

You use one of the following, similar procedures to import and migrate Hive tables to Iceberg:
  • Importing and migrating Iceberg table in Spark 3

    A backup table is created, but does not incur the overhead of moving the physical location of the table on the object store.

  • Importing and migrating Iceberg table format v2

    The Iceberg merge on read operation is used.

By default, when you migrate a Hive external table to an Iceberg v2 table, the file is not rewritten. A write occurs to a new file. A read merges changes into the original file. The default merge on read tends to speed up the write and slow down the read. You can configure several other types of migration behavior if the default merge on read does not suit your use case.

For more information about using Iceberg in CDE, see Using Apache Iceberg in Cloudera Data Engineering.