Preparing to create Iceberg replication policy
Prepare for creating an Iceberg replication policy by provisioning Data Hubs, enabling the Iceberg Replication feature, and configuring necessary settings.
- Provision a source Data Hub in the source Data Lake and a target Data Hub in the target Data Lake. For instructions, see Provisioning Iceberg Replication Data Hub.
- Enable the Iceberg Replication feature in the Data Hubs. Contact your Cloudera account team to enable the Iceberg Replication feature in the deployed Iceberg Replication Data Hub.
- Manually associate the Cloudera Manager peer using Cloudera Manager API explorer only if the source and target environment names are not unique within the first eight characters. For more information, see Associating Cloudera Manager peer to use in Iceberg replication policy.
- Configure the required IAM role access for the Replication Manager users. For more information, see Configuring IAM role access for Iceberg replication.
-
Ensure that Cloudera Lakehouse Optimizer is disabled and is not available on
your target cluster if you have enabled the service in your AWS or Azure
environment.
This service is available in 7.3.1.500 and higher versions.If the Cloudera Lakehouse Optimizer service is available on the target cluster and if a compaction maintenance task is scheduled to run on the replicated tables, the Cloudera Lakehouse Optimizer policy runs a compaction maintenance task on the replicated Iceberg tables.By default, if the metadata.json file of the target cluster is absent on the source cluster, Replication Manager initiates a bootstrap replication in the subsequent Iceberg replication policy job. During the bootstrap replication, Replication Manager copies the already replicated small files from the source cluster to the target cluster, and the Cloudera Lakehouse Optimizer policy detects these small files and triggers a compaction maintenance task. This leads to a repetitive cycle and negates the benefit of the compaction task.
For more information about Cloudera Lakehouse Optimizer, see Lakehouse Optimizer.
