Go to the Replication Manager > Replication Policies page.

Click Create Policy.

The Create Replication Policy wizard is displayed.

On the General page, configure the following details:


Field	Description
Type	Select Iceberg to create an Iceberg replication policy.
Policy Name	Enter a unique name for the replication policy.
Description	Optional. Enter a brief description about the replication policy.

Click Next.

On the Select Source page, configure the following details:


Field	Description
Source Data Hub	Choose the source Iceberg Replication Data Hub on the source Data Lake. If the required Iceberg Replication Data Hub does not display, ensure that it is deployed and the Iceberg Replication feature is enabled in the Data Hub.
Inclusion Filters	Enter the database or schema name and the table name. The table name can be a Java regular expression (regex) pattern. Replication Manager includes these tables in the replication policy job runs.
Exclusion Filters	Optional. Specify the tables to exclude from the replication policy.
Replicate Column Statistics	Select to replicate the column statistics of the tables.
Run as Username on Source	Enter the required username to run the replication policy. The username overrides the default hdfs username.

Click Next.

On the Select Destination page, configure the following options:


Field	Description
Destination Data Hub	Select the target Iceberg Replication Data Hub on the target Data Lake. If the required Iceberg Replication Data Hub does not display, ensure that it is deployed and the Iceberg Replication feature is enabled in the Data Hub.
Validate Access on Each Run	Select to verify that the source or target cluster has the required access to the source or target bucket during each job run. By default, access is verified only once after the replication policy creation process is complete.
Run DistCp on Source	Select to run the transfer step on the source cluster during each job run. By default, the transfer step runs on the destination cluster. The transfer step location determines the IAM bucket access requirements. When the transfer step runs on the target cluster, the target cluster’s IAM requires read access to the source data location. When the transfer step runs on the source cluster, the source cluster’s IAM requires read and write access to the destination data locations.
Alternative Staging Location	Enter an alternate staging location if you do not want to use the default location to stage the intermediate work. The default location varies with the environment. For example, the default location might be located in `[* CLOUD DATA ROOT BUCKET FROM ENVIRONMENT *]`/user/replication on the target. note Ensure that the alternate staging location is accessible where source IAM has write access and the destination IAM has read access.
Location Mapping	Configure to override the path mapping when copying files.

Click Validate Access.

Replication Manager performs the following steps:

Validates the IAM role bucket permissions as shown in the following table:


IAM role	Required permissions
Source IAM role	Read access to source data locations. Read and write access to the staging location. Write access to the destination warehouse if the DistCp jobs run on the source.
Target IAM role	Write access to the target data locations. Read access to the staging location. Read access to the source data locations if the DistCp jobs run on the destination.

Validates whether the Cloudera Manager peer exists. Otherwise, Replication Manager verifies whether a peer can be created from the target cluster to the source cluster.

note
The Cloudera Manager peer creation check fails if the source and destination clusters are not unique in the first eight characters of the respective environment. In this case, you must associate the Cloudera Manager peer.

Click Next after the Validate Access process is successfully completed.

On the Schedule page, configure the following information:


Option	Description
Run Now	Starts to replicate the data after the replication policy creation is complete. Select the frequency to replicate data periodically.
Schedule Run	Runs the replication policy at a later time. Specify the date and time for the first run, and then set the frequency to replicate data periodically. tip On the Replication Policies page, click to change the time zone.
Frequency	Select one of the following options: Does Not Repeat Custom – In the Custom Recurrence dialog box, set the time, date, and the frequency to run the policy. Replication Manager ensures that the exact same number of seconds elapses between the runs. For example, if you set the Start Time to `January 19, 2022 11.06 AM` and the Interval to `1 day`, Replication Manager runs the replication policy for the first time at the specified time in the time zone where it was created. Subsequent runs occur exactly 1 day that (24 hours or 86400 seconds) later. note Ensure that the frequency in a schedule enables a job to finish before the next job starts. Also, ensure that the jobs based on the same policy do not overlap. If a job does not complete before the next job starts, the second job does not run and its status changes to Skipped. If a job is consistently skipped, you might need to modify the frequency of the job.

Click Next.

On the Additional Settings page, configure the values as necessary. These advanced parameters can be configured for specific purposes depending on your requirements:


Field	Description
YARN Queue Name	Enter the name of the YARN queue for the replication job if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is `default`.
Maximum Maps Slots	Set the maximum number of map tasks (simultaneous copies) per replication job. The default value is `20`.
Maximum Bandwidth	Specify the maximum bandwidth for each copy (map) task. The default value for the bandwidth is 100MB per second for each mapper or copy task.
Batch Size	Enter the maximum number of snapshots to process per export batch. A high volume of source changes affects the time taken by each replication run. Setting this limit controls the amount of work to be processed in a single batch, which improves throughput and makes the replication run time more predictable. By default, this field is empty, meaning the job processes all available snapshots in an export batch.
Alerts	Select when to generate alerts for the replication job: On Failure, On Start, On Success, or On Abort.
Advanced Configuration Snippet (Safety Valve) for source hdfs-site.xml	Add one or more key-value pairs to the hdfs-site.xml file on the source cluster. New key-value pairs are added to the file. Existing key-value pairs are overwritten in the file. caution Configure this value only under the guidance of Cloudera Technical Support.
Advanced Configuration Snippet (Safety Valve) for source core-site.xml	Add one or more key-value pairs to the core-site.xml file on the source cluster. New key-value pairs are added to the file. Existing key-value pairs are overwritten in the file. caution Configure this value only under the guidance of Cloudera Technical Support.
Advanced Configuration Snippet (Safety Valve) for destination hdfs-site.xml	Add one or more key-value pairs to the hdfs-site.xml file on the target cluster. New key-value pairs are added to the file. Existing key-value pairs are overwritten in the file. caution Configure this value only under the guidance of Cloudera Technical Support.
Advanced Configuration Snippet (Safety Valve) for destination core-site.xml	Add one or more key-value pairs to the core-site.xml file on the target cluster. New key-value pairs are added to the file. Existing key-value pairs are overwritten in the file. caution Configure this value only under the guidance of Cloudera Technical Support.

Click Create.

The replication policy is displayed on the Replication Policies page. If you selected Immediate in the Schedule field, the replication job starts replicating after you click Create.

Creating Iceberg replication policy