Importing and migrating Iceberg table format v2
Importing or migrating Hive tables Iceberg table formats v2 are supported only on existing external Hive tables. When you import a table to Iceberg, the source and destination remain intact and independent. When you migrate a table, the existing Hive table is converted into an Iceberg table. You can use Spark SQL to import or migrate a Hive table to Iceberg.
Importing
spark.sql("CALL
<catalog>.system.snapshot(source_table => '<src>',
table => '<dest>',
properties => map('format-version', '2', 'write.delete.mode', '<delete-mode>',
'write.update.mode', '<update-mode>',
'write.merge.mode', '<merge-mode>'))")
-
<src>
is the qualified name of the Hive table -
<dest>
is the qualified name of the Iceberg table to be created -
<catalog>
is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below. -
<delete-mode>
<update-mode>
and<merge-mode>
are the modes that shall be used to perform the respective operation. If unspecified, they default to'merge-on-read'
For example:
spark.sql("CALL
spark_catalog.system.snapshot('hive_db.hive_tbl',
'iceberg_db.iceberg_tbl')")
Migrating
spark.sql("CALL
<catalog>.system.migrate('<src>',
map('format-version', '2',
'write.delete.mode', '<delete-mode>',
'write.update.mode', '<update-mode>',
'write.merge.mode', '<merge-mode>'))")
-
<src>
is the qualified name of the Hive table -
<catalog>
is the name of the catalog, which you pass in a configuration file. For more information, see Configuring Catalog linked below. -
<delete-mode>
<update-mode>
and<merge-mode>
are the modes that shall be used to perform the respective operation. If unspecified, they default to'merge-on-read'
For example:
spark.sql("CALL
spark_catalog.system.migrate('hive_db.hive_tbl',
map('format-version', '2',
'write.delete.mode', 'merge-on-read',
'write.update.mode', 'merge-on-read',
'write.merge.mode', 'merge-on-read'))")
Upgrading Iceberg table format v1 to v2
To upgrade an Iceberg table format from v1 to v2, run an ALTER TABLE command as follows:
spark.sql("ALTER TABLE <table_name> SET TBLPROPERTIES('merge-on-read', '2')")
<delete-mode>,<update-mode>,
and
<merge-mode>
can be specified as the modes that shall be used to
perform the respective operation. If unspecified, they default to
‘merge-on-read'