Iceberg replication policies
Iceberg replication policies replicate Iceberg V1 and V2 tables, created using Spark (read-only with Impala), between Cloudera Base on premises 7.1.9 or higher clusters using Cloudera Manager 7.11.3 or higher versions. In Cloudera Base on premises 7.3.1 and in higher versions, Replication Manager can also replicate V1 and V2 Iceberg tables created using Hive.
Apache Iceberg is a cloud-native, high-performance open table format for organizing petabyte-scale analytic datasets on a file system or object store. Iceberg supports ACID compliant tables which includes row-level deletes and updates and can define large analytic data tables using open format files.
- Replicating metadata and catalog from the source cluster Hive Metastore (HMS) to the target cluster HMS.
- Replicating data files in the HDFS storage system from the source cluster to the target cluster. The Iceberg replication policies can replicate only between HDFS storage systems.
- Replicating data at table level.
- Replicating all the snapshots from the source cluster which allows you to run time travel queries on the target cluster.
- replicate Iceberg tables between on-premises clusters to archive data or run analytics,
- implement passive disaster recovery with planned failover and perform incremental replication at regular intervals between two similar systems. For example, between an HDFS to another HDFS system.
This video demonstrates the ability of the Iceberg replication policy to replicate multiple Iceberg tables from diverse locations in the source cluster to a target cluster in a single replication job. It also showcases the Hive on Iceberg feature that allows you to replicate Iceberg tables created by Hive and Impala, and also a use case related to the location mapping feature to map the source path and the target path. These features are available in Cloudera Base on premises 7.3.1 using Cloudera Manager 7.13.1 and higher versions.
