Importing and migrating Iceberg table format v2

Importing or migrating Hive tables Iceberg table formats v2 are supported only on existing external Hive tables. When you import a table to Iceberg, the source and destination remain intact and independent. When you migrate a table, the existing Hive table is converted into an Iceberg table. You can use Spark SQL to import or migrate a Hive table to Iceberg.

Importing

Call the snapshot procedure to import a Hive table into Iceberg table format v2 using a Spark 3 application.
spark.sql("CALL <catalog>.system.snapshot(source_table => '<src>', table => '<dest>', properties => map('format-version', '2', 'write.delete.mode', '<delete-mode>', 'write.update.mode', '<update-mode>', 'write.merge.mode', '<merge-mode>'))")
Definitions:
  • <src> is the qualified name of the Hive table

  • <dest> is the qualified name of the Iceberg table to be created

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.

  • <delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL spark_catalog.system.snapshot('hive_db.hive_tbl',
          'iceberg_db.iceberg_tbl')")
For information on compiling Spark 3 application with Iceberg libraries, see Iceberg library dependencies for Spark applications linked below.

Migrating

Call the migrate procedure to migrate a Hive table to Iceberg.
spark.sql("CALL <catalog>.system.migrate('<src>', map('format-version', '2', 'write.delete.mode', '<delete-mode>', 'write.update.mode', '<update-mode>', 'write.merge.mode', '<merge-mode>'))")
Definitions:
  • <src> is the qualified name of the Hive table

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.

  • <delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL spark_catalog.system.migrate('hive_db.hive_tbl', map('format-version', '2', 'write.delete.mode', 'merge-on-read', 'write.update.mode', 'merge-on-read', 'write.merge.mode', 'merge-on-read'))")

Upgrading Iceberg table format v1 to v2

To upgrade an Iceberg table format from v1 to v2, run an ALTER TABLE command as follows:

spark.sql("ALTER TABLE <table_name> SET TBLPROPERTIES('merge-on-read', '2')")

<delete-mode>,<update-mode>, and <merge-mode> can be specified as the modes that shall be used to perform the respective operation. If unspecified, they default to ‘merge-on-read'