Creating and managing Cloudera Data Engineering jobsPDF version

Importing and migrating Iceberg table format v2

Importing or migrating Hive tables Iceberg table formats v2 are supported only on existing external Hive tables. When you import a table to Iceberg, the source and destination remain intact and independent. When you migrate a table, the existing Hive table is converted into an Iceberg table. You can use Spark SQL to import or migrate a Hive table to Iceberg.

Call the snapshot procedure to import a Hive table into Iceberg table format v2 using a Spark 3 application.
spark.sql("CALL 
<catalog>.system.snapshot(source_table => '<src>',
table => '<dest>',
properties => map('format-version', '2', 'write.delete.mode', '<delete-mode>',
'write.update.mode', '<update-mode>',
'write.merge.mode', '<merge-mode>'))")
Definitions:
  • <src> is the qualified name of the Hive table

  • <dest> is the qualified name of the Iceberg table to be created

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.

  • <delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL 
spark_catalog.system.snapshot('hive_db.hive_tbl',
'iceberg_db.iceberg_tbl')")
For information on compiling Spark 3 application with Iceberg libraries, see Iceberg library dependencies for Spark applications linked below.
Call the migrate procedure to migrate a Hive table to Iceberg.
spark.sql("CALL 
<catalog>.system.migrate('<src>', 
map('format-version', '2', 
'write.delete.mode', '<delete-mode>', 
'write.update.mode', '<update-mode>', 
'write.merge.mode', '<merge-mode>'))")
Definitions:
  • <src> is the qualified name of the Hive table

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below.

  • <delete-mode> <update-mode> and <merge-mode> are the modes that shall be used to perform the respective operation. If unspecified, they default to 'merge-on-read'

For example:

spark.sql("CALL 
spark_catalog.system.migrate('hive_db.hive_tbl', 
map('format-version', '2', 
'write.delete.mode', 'merge-on-read', 
'write.update.mode', 'merge-on-read', 
'write.merge.mode', 'merge-on-read'))")

To upgrade an Iceberg table format from v1 to v2, run an ALTER TABLE command as follows:

spark.sql("ALTER TABLE <table_name> SET TBLPROPERTIES('merge-on-read', '2')")

<delete-mode>,<update-mode>, and <merge-mode> can be specified as the modes that shall be used to perform the respective operation. If unspecified, they default to ‘merge-on-read'