How Iceberg replication policies work
replicates Iceberg tables by identifying tables, exporting metadata, transferring files, and updating target clusters when you create or run an Iceberg replication policy.
The replication process consists of the following high-level steps:
- Identifying the tables to replicate based on the table regular expression or table pattern selections you make during the Iceberg replication policy creation process.
- Reading the table names to fetch the checkpoints for the tables from the target cluster Hive Metastore (HMS).
- Initiating the exportCLI command on the source cluster. The command generates a list of manifest files, data files, and delete files that must be copied from the source cluster to the target cluster.
- Copying the files from the source cluster to the target cluster using the DistCp jobs. The transfer speed depends on the bandwidth of the target cluster. These jobs copy the data files directly to the target data root directory, and the metadata files to a temporary staging location.
- Transforming the copied manifest files to point to the correct manifest file pointers and data files on the target cluster and updating the target HMS with the latest snapshot.
