Creating Ranger replication policies
You can create Ranger replication policies in CDP Private Cloud Base Replication Manager. The Ranger replication policies copy or migrate Ranger policies for HDFS, Hive, and HBase between CDP Private Cloud Base 7.1.9 or higher clusters using Cloudera Manager 7.11.3.
- Go to the page on the target cluster.
- Click .
Configure the following options on the General tab:
Option Description Name Enter a unique name for the replication policy. Source Choose the source cluster.
Ensure that you add the source peer as an admin peer on thepage; otherwise, the replication job fails.
Destination Choose the target cluster. Schedule Choose:
- Immediate to run the schedule after policy creation.
- Once to run the schedule one time after policy creation. Set the date and time.
- Recurring to run the schedule periodically. Set the date, time, and interval between runs.
Replicate Ranger data Select to replicate the Ranger policies and roles for the resources you choose in the Services tab. Replicate Ranger audit logs in HDFS Select to replicate the Ranger audit logs in HDFS. Source side HDFS audit log directory* Shows the source Ranger HDFS audit log path by default. For example, hdfs://[***source.url***]:8020/ranger/audit/
You can edit the log directory path to replicate only a subset of logs by appending hdfs, hbase, or atlas at the end of the default path. For example, if you append hdfs at the end of the default path, Replication Manager replicates only the HDFS Ranger audit logs.
Destination directory* Shows the destination Ranger HDFS audit path where the source HDFS audit logs are replicated to, by default.
The default path is the /ranger/audit/replication/[***sourcePeerNameBase64***]/[***sourceClusterNameBase64***]/[***sourceRangerServiceNameBase64***]/ subdirectory.
The replication folder has three Base64 encoded directories to avoid illegal HDFS characters.
Maximum Number of Copy Mappers (Optional) Enter the maximum number of simultaneous copy mappers for DistCp to replicate Ranger audit logs in HDFS. The default value is 20. Maximum Bandwidth Per Copy Mapper
(Optional) Enter the bandwidth limit for each mapper to replicate Ranger audit logs in HDFS. Default is 100 MB.
The total bandwidth used by the replication policy is equal to Maximum Bandwidth multiplied by Maximum Map Slots. Therefore, you must ensure that the bandwidth and map slots you choose do not impact other tasks or network resources in the target cluster.
File listing threads
(Optional) Choose the Override DistCp default option and configure the number of threads (a maximum of 128 threads) that the replication policy must use during the copylisting phase of replication. By default, Replication Manager uses the default value of 20 threads for the copylisting phase of replication.
The default number of threads for the copylisting phase of replication (using replication policies) can be set in the core-site.xml or hdfs-site.xml file for the HDFS service. You can set a maximum of 128 threads only.
MapReduce Service Select the MapReduce or YARN service to use. Scheduler Pool
(Optional) Enter the name of a resource pool in the field. The value you enter is used by the MapReduce Service you specified when Cloudera Manager executes the MapReduce job for the replication. The job specifies the value using one of these properties:
- MapReduce – Fair scheduler: mapred.fairscheduler.pool
- MapReduce – Capacity scheduler: queue.name
- YARN – mapreduce.job.queuename
Run as Username Enter the username to run the replication job. Ensure that the user is in the supergroup group on the HDFS NameNode host of the target cluster. Run on Peer as Username Enter the user if the peer cluster is configured with a different superuser. Ensure that the user is in the supergroup group on the HDFS NameNode host of the source cluster. *the values for the field are derived from the ranger_plugin_hdfs_audit_url API.
Configure the following options on the Services tab:
Option Description Source Service Names Choose one or more service names for which you want to copy or migrate the Ranger policies. You can choose HDFS, HBase, and Hive services, and also choose the tag services.
Replication Manager pairs or maps the source and destination Ranger services according to their service types.
Destination Service Names Choose the service name on the target cluster. If there are more than one Ranger service of the same type on the target cluster, choose the required service from the drop-down list.
Configure the following options on the Advanced tab:
Option Description Users Mapping Enter the usernames for the services only if the usernames defined in Ranger differ in the source and target clusters. Resources Mapping Enter the resource paths for the services only if the resource path defined in Ranger differs in the source and target clusters. Policy Import strategy Choose one of the following methods for file ingestion:
- Merge - Replication
Manager merges the Ranger policies. By default,
Replication Manager uses this method.
For example, assume a Ranger policy in the destination Ranger service contains user1 and the same Ranger policy on the source cluster has user2. In this method, both user1 and user2 are added in the destination Ranger policy after replication.
- Override -
Replication Manager overwrites the existing Ranger
For example, assume a Ranger policy in the destination Ranger service contains user1 and the same Ranger policy on the source cluster has user2. In this method, user1 is removed and user2 is added in the destination Ranger policy after replication.
Description Optionally, you can enter a brief description. Alerts Choose to generate alerts for various state changes in the replication workflow. You can alert On Failure, On Start, On Success, or On Abort of the replication job.
- Merge - Replication Manager merges the Ranger policies. By default, Replication Manager uses this method.
- Click Create.
If you selected Immediate in the Schedule field, the replication job starts replicating after you click Save Policy.