Go to the Management Console > Replication Manager > Replication Policies page.

Click Create Policy.

The Create Replication Policy wizard is displayed.

Click Ranger.

Configure the following fields on the General tab:


Field	Description
Policy Name	Enter a unique name for the replication policy.
Description	Optional. Enter a description for the replication policy
Type	Select Ranger.

Click Next.

Configure the following fields on the Select Source tab as necessary:


Field	Description
Source Cluster	Select the on-premises source cluster.
Replicate Audit logs	Select Replicate Audit logs to replicate the Ranger audit logs in HDFS.
Replicate Audit logs	Configure the following fields: Cloud Credential On Source – Select a cloud credential to access the target cloud storage to write the audit logs on the target cluster. The cloud credentials that you register for Replication Manager on the Cloud Credentials page are displayed in this field. If the required cloud credential is not displayed, click Add Cloud Credential to add the credentials. Audit Logs location (on source) – Displays the source Ranger HDFS audit log path by default. For example, hdfs://`[*SOURCE URL*]`:8020/ranger/audit/ You can edit the log directory path to replicate only a subset of logs by appending hdfs, hbase, or atlas to the end of the default path. For example, if you append hdfs to the end of the default path, Replication Manager replicates only the HDFS Ranger audit logs. Run As Username (on source) – Enter the username to run the replication job. Ensure that the user is in the supergroup group on the source cluster.
Replicate Ranger data	Select Replicate Ranger data to replicate the Ranger policies and roles for the resources you selected on the Select Destination tab.
Replicate Ranger data	Select one of the following Policy Import strategy to ingest the files: Merge method (default) – Replication Manager merges the Ranger policies. For example, if a Ranger policy in the target Ranger service has user1 and the same Ranger policy on the source cluster has user2, both user1 and user2 are added in the target Ranger policy after replication. Override method – Replication Manager overwrites the existing Ranger policies. For example, if a Ranger policy in the target Ranger service has user1 and the same Ranger policy on the source cluster has user2, user1 is removed and user2 is added in the target Ranger policy after replication.

Click Next.

Configure the following fields on the Select Destination tab as necessary:


Field	Description
Destination Data Lake	Select the target Data Lake.
Settings for Replication Ranger data	Configure the following fields for Service Mappings to map the source and target services depending on your requirements: Enable replication – Select the field to instruct the replication policy to replicate the Ranger data for the chosen source service to the chosen target service. Source service name – Cannot be edited. Destination service name – Retain the default service name or select the service name on your target Data Hub cluster. Select the destination service from the dropdown list of services in the target Cloudera Manager that has the same type as the selected source service. For example, you can replicate the source Hive service's Ranger policy to any target Hive service. note Ranger replication policies can replicate only the Ranger policies and roles for HDFS, Hive, and HBase services. Configure the following fields to map users and resources: User Mapping – Enter the usernames for the services only if the usernames defined in Ranger differ in the source and target clusters. Source user name – Enter the user name for the Ranger service on the source cluster. Destination user name – Enter the user name for the Ranger service on the target cluster. Resource Mapping – Enter the resource name for the services only if the resource name defined in Ranger differs in the source and target clusters. Ensure that you select the Override policy import strategy before you enter the details in this field. Source resource – Enter the source resource name. Destination resource – Enter the target resource name. Hive URL Mapping – This field is only enabled if you chose the Hive service. Enter the Hive prefixed-based resource URL replacement. To understand Hive URL, see Create a Hive authorizer URL policy. Source url – Enter the source Hive URL. Destination url – Enter the target Hive URL. note No resource mapping is required to transform HDFS Ranger service policies to cloud policies.

Click Next.

On the Schedule page, choose or enter the following information:


Option	Description
Run Now	Starts to replicate the existing HDFS data after the replication policy creation is complete. Choose the frequency to replicate data periodically.
Schedule Run	Runs the replication policy to replicate data at a later time. Choose the date and time for the first run, and then choose the frequency to replicate data periodically. tip On the Replication Policies page, click to change the time zone.
Frequency	Choose one of the following options: Does Not Repeat Custom – In the Custom Recurrence dialog box, choose the time, date, and the frequency to run the policy. Replication Manager ensures that the same number of seconds elapses between the runs. For example, if you set the Start Time to `January 19, 2022 11.06 AM` and the Interval to `1 day`, Replication Manager runs the replication policy for the first time at the specified time in the time zone in which you created the replication policy. It then runs the policy exactly one day (24 hours or 86400 seconds) later. note Configure the schedule frequency so that a job finishes before the next job starts, preventing overlapping runs for the same policy. If a job does not complete before another job starts, the second job does not run and displays its job status as Skipped. If a job is consistently skipped, modify the schedule frequency.

Configure the following fields on the Additional Settings tab. The fields are displayed depending on whether you choose the Replicate Audit Logs field or not. When you choose the Replicate Audit Logs field, all the fields except Alerts are displayed. When you do not select the field, only the Alerts field is displayed:


Field	Description
YARN Queue Name	Enter the name of the YARN queue for the cluster to which the replication job is submitted if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is `default`.
Maximum Maps Slots	Set the maximum number of map tasks (simultaneous copies) per replication job. The default value is `20`.
Maximum Bandwidth	Adjust this setting to restrict the bandwidth consumed by each map. The default value for the bandwidth is 100 MB per second for each mapper. The map task dynamically throttles its bandwidth consumption during a copy operation so that the net used bandwidth aligns with the specified value, however, the exact net usage might fluctuate.
Replication Strategy	Select one of the following replication strategies: Static – Distributes file replication tasks among the mappers in advance to achieve a uniform distribution based on file sizes. Dynamic – Distributes the file replication tasks to mappers in small sets. As a mapper completes its current set, it dynamically acquires and processes the next set of unallocated tasks. The default replication strategy is `Dynamic`.
MapReduce Service	Select the MapReduce or YARN service to use.
Log Path	Enter an alternate path for the logs, if required.
Error Handling	Select one of the following options as necessary: Skip Checksum Checks – Skips checksum checks on the copied files. If selected, checksums are not validated. Checksums are checked by default. note Note: You must skip checksum checks to prevent replication failure due to non-matching checksums in the following cases: Replications from an encrypted zone on the source cluster to an encrypted zone on a target cluster. Replications from an encryption zone on the source cluster to an unencrypted zone on the target cluster. Replications from an unencrypted zone on the source cluster to an encrypted zone on the target cluster. Checksums are used for the following purposes: To skip replication of files that have already been copied. If Skip Checksum Checks is selected, the replication job skips copying a file if the file lengths and modification times are identical between the source and target clusters. Otherwise, the job copies the file from the source to the target. To redundantly verify the integrity of data. However, checksums are not required to guarantee accurate transfers between clusters. HDFS data transfers are protected by checksums during transfer and storage hardware also uses checksums to ensure that data is stored accurately. These two mechanisms work together to validate the integrity of the copied data. Skip Listing Checksum Checks – Skips checksum verification when comparing two files to see if they are identical. If selected, the file size and last modified time are used to compare the files. Skipping the check improves performance during the mapper phase. note If you select the Skip Checksum Checks option, this check is also skipped. Abort on Error– Aborts the job upon encountering an error. If selected, files copied up to that point remain on the target, but no additional files are copied. By default, this is not selected. Abort on Snapshot Diff Failures – Aborts the replication if a snapshot diff fails. By default, if a snapshot diff fails, the replication policy uses a complete copy to replicate data. If selected, the policy aborts the replication entirely instead.
Preserve	Select one or more of the following attributes you want to keep from the source file system: Block Size Replication Count Permissions – Replication preserves ACLs, when both the source and destination clusters support ACLs. Otherwise, ACLs are not replicated. Extended Attributes – Replication preserves extended attributes when both the source and destination clusters support extended attributes. This option is displayed only when both source and destination clusters support extended attributes. If an option is not selected, the replication job uses the settings of the destination file system. By default, the source system settings are preserved. If you select any Preserve options when replicating to S3 or ADLS, the values are saved in metadata files on S3 or ADLS. When you replicate from S3 or ADLS to HDFS, you can select which of these saved options to preserve. note To preserve permissions to HDFS, you must be running as a superuser on the destination cluster. Use the Run As Username field to set the username.
Delete Policy	Select one of the following options to determine how the policy handles files that were deleted on the source, as well as files in the target location that are unrelated to the source: Keep Deleted Files – Retains the destination files even when they no longer exist at the source. This is the default option. Delete to Trash – Moves files to the trash folder, if the HDFS trash is enabled. This option is not supported when replicating to S3 or ADLS. Delete Permanently – Deletes files permanently, which uses the least amount of space. Use with caution. important If the source path is globbed, the replication policy ignores the Delete to Trash and Delete Permanently options and performs the Keep Deleted Files operation on the target files. The target files are not moved to trash or deleted regardless of the option you select for the replication policy.
Alerts	Select when to generate alerts for the replication job: On Failure, On Start, On Success, or On Abort. You can configure alerts to be delivered by email or sent as SNMP traps. If alerts are enabled for events, you can search for and view the alerts on the Events tab, even if email notifications are not configured. For example, if you filter by Command Result containing Failed on the Diagnostics > Events page, the On Failure alerts are displayed for all the replication policies for which you have set the alert.

Click Create.

The replication policy is displayed on the Replication Policies page.

If you selected Immediate in the Schedule field, the replication job starts replicating after you click Create.

Creating Ranger replication policies