Importing and migrating Iceberg table format v2

Importing or migrating Hive tables Iceberg table formats v2 are supported only on existing external Hive tables. When you import a table to Iceberg, the source and destination remain intact and independent. When you migrate a table, the existing Hive table is converted into an Iceberg table. You can use Spark SQL to import or migrate a Hive table to Iceberg.

Importing🔗

Call the snapshot procedure to import a Hive table into Iceberg table format v2 using a Spark 3 application.

spark.sql("CALL 
<catalog>.system.snapshot(source_table => '<src>',
table => '<dest>',
properties => map('format-version', '2', 'write.delete.mode', '<delete-mode>',
'write.update.mode', '<update-mode>',
'write.merge.mode', '<merge-mode>'))")

Definitions:

<src> is the qualified name of the Hive table
<dest> is the qualified name of the Iceberg table to be created
<catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.
<delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL 
spark_catalog.system.snapshot('hive_db.hive_tbl',
'iceberg_db.iceberg_tbl')")

For information on compiling Spark 3 application with Iceberg libraries, see Iceberg library dependencies for Spark applications linked below.

Migrating🔗

Call the migrate procedure to migrate a Hive table to Iceberg.

spark.sql("CALL 
<catalog>.system.migrate('<src>', 
map('format-version', '2', 
'write.delete.mode', '<delete-mode>', 
'write.update.mode', '<update-mode>', 
'write.merge.mode', '<merge-mode>'))")

Definitions:

<src> is the qualified name of the Hive table
<catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.
<delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL 
spark_catalog.system.migrate('hive_db.hive_tbl', 
map('format-version', '2', 
'write.delete.mode', 'merge-on-read', 
'write.update.mode', 'merge-on-read', 
'write.merge.mode', 'merge-on-read'))")

Upgrading Iceberg table format v1 to v2 🔗

To upgrade an Iceberg table format from v1 to v2, run an ALTER TABLE command as follows:

spark.sql("ALTER TABLE <table_name> SET TBLPROPERTIES('merge-on-read', '2')")

<delete-mode>,<update-mode>, and <merge-mode> can be specified as the modes that shall be used to perform the respective operation. If unspecified, they default to ‘merge-on-read'