Iceberg replication policies

Iceberg replication policies replicate Iceberg tables between CDP Private Cloud Base 7.1.9 or higher clusters using Cloudera Manager 7.11.3 or higher versions.

​​Apache Iceberg is a cloud-native, high-performance open table format for organizing petabyte-scale analytic datasets on a file system or object store. Iceberg supports ACID compliant tables which includes row-level deletes and updates and can define large analytic data tables using open format files.

Iceberg replication policies:

  • replicate the metadata and catalog from the source cluster Hive Metastore (HMS) to target cluster HMS.

    The catalog contains the current metadata pointer/file and is stored in the Hive HMS. The metadata file contains the snapshots. The snapshots point to the manifest list that has the manifest files. The manifest files in turn point to the data files.

  • replicate the data files in the HDFS storage system from the source cluster to the target cluster. The Iceberg replication policy can replicate only between HDFS storage systems.
  • replicate all the snapshots from the source cluster. This allows you to run time travel queries on the target cluster.
  • replicate at table-level.

    You must ensure that the tables are in the default warehouse location because Iceberg replication policies do not replicate tables in a custom location.

Some use cases where you can use Iceberg replication policies are to:
  • replicate Iceberg tables between on-premises clusters to archive data or run analytics.
  • implement passive disaster recovery with planned failover and incremental replication at regular intervals between two similar systems. For example, between an HDFS to another HDFS system.