Iceberg replication policies
The Iceberg replication policies in can replicate Iceberg tables between Data Lakes through Data Hubs in 7.3.2 or higher versions using AWS. The Data Lakes can be located in a single AWS region or across multiple regions.
You must deploy the source Iceberg Replication Data Hub in the source Data Lake and the target Iceberg Replication Data Hub in the target Data Lake, and then create the Iceberg replication policy in the target Data Hub. The deployed Data Hubs provide the Hive database the details about the table metadata, source location of the tables, and optionally compute resources for the replication process. The replication occurs between S3-backed Data Lakes using HDFS protocols (or DistCp).
- Metadata and catalog from the source cluster Hive Metastore (HMS) to the target cluster HMS.
- Data files from the source cluster to the target cluster. The Iceberg replication policy can replicate only between AWS S3 storage in public cloud environments.
- All snapshots incrementally from the source cluster by default. This allows you to run time travel queries on the target cluster.
- Implementing disaster recovery.
- Implementing passive disaster recovery with incremental replication at regular intervals between two similar systems.
