Importing and migrating Iceberg table in Spark 3

Importing or migrating tables are supported only on existing external Hive tables. When you import a table to Iceberg, the source and destination remain intact and independent. When you migrate a table the existing Hive table is converted into an Iceberg table.

Prerequisite

Add the following configuration to your Spark job:
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.serializer=org.apache.spark.serializer.JavaSerializer

Importing

Call the snapshot procedure to import a Hive table into Iceberg using a Spark 3 application.
spark.sql("CALL <catalog>.system.snapshot('<src>', '<dest>')")
where:
  • <src> is the qualified name of the Hive table

  • <dest> is the qualified name of the Iceberg table to be created

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Catalog configuration.

For example:

spark.sql("CALL spark_catalog.system.snapshot('hive_db.hive_tbl',
          'iceberg_db.iceberg_tbl')")
For information on compiling Spark 3 application with Iceberg libraries, see Iceberg library dependencies for Spark applications.

Migrating

Call the migrate procedure to migrate a Hive table to Iceberg.
spark.sql("CALL <catalog>.system.migrate('<src>')")
where:
  • <src> is the qualified name of the Hive table

  • <catalog> is the name of the catalog, which you pass in a configuration file. For more information, see Catalog configuration.

For example:

spark.sql(“CALL
      spark_catalog.system.migrate(‘hive_db.hive_tbl’)”)