Table migration overview
You learn what happens under the covers during Hive table to Iceberg table migration.
Migration from Hive
When you migrate an external Hive table to Iceberg, Hive makes the following changes:
- Converts the storage_handler, serde, inputformat and outputformat properties of the table in HMS to use the Iceberg specific classes.
- Reads the footers of the existing data files and generates the necessary Iceberg metadata files based on the footers.
- Commits all the data files to the Iceberg table in a single commit.
Migration from Impala
Impala’s table migration process consists of multiple steps under the hood:
- Sets table properties on the Hive table, such as 'external.table.purge'=false
- Renames the original table name to a temporary name in the format “<original_name>_tmp_<random_ID>”
- Refreshes the renamed Hive table.
- Creates an Iceberg table using the original table name and the same location as Hive table.
- Using the metadata for the data files of the Hive table, populates the Iceberg table with the data files. Reads the footer of the data files and saves statistics to Iceberg metadata.
- Sets the Iceberg table ‘external.table.purge’ property to true.
- Drops the renamed Hive table. The ‘external.table.purge’ property set to false prevents dropping the data files just deletes the metadata of the Hive table.
If for any reason the table migration fails, Impala cleans up the Iceberg metadata files created during step 6.
Perhaps the original Hive table had already been renamed when the table migration failed. In this case, you see an error message about how to rename the Hive table to the original name. For example:
ALTER TABLE <tbl_name>_tmp_<random_ID> RENAME TO <tbl_name>;