Creating Ozone replication policies using Cloudera Manager APIs

You can create Ozone replication policies using Cloudera Manager APIs on the target cluster.

Consider the following points before you create Ozone replication policies:

Data is replicated at bucket-level. Therefore, use [***volume***]/[***bucket***] format to point to the required buckets during replication policy creation.
Ozone replication policies perform incremental replication using file checksums and is supported by all the bucket types except OBS buckets.

Go to the Cloudera Manager > Support > API Explorer page on the target cluster.
Go to the ReplicationsResource API resource.

Enter the following parameters to create an Ozone replication policy in the POST /clusters/{clusterName}/services/{serviceName}/replications HTTP method:

Enter the cluster name in the clusterName field. For example, Cluster 1.
Enter the service name in the serviceName field. For example, OZONE-1.
Enter one or more of the API parameters in the API parameters for Ozone replication policy table in the body field.

The following table lists the available API parameters that you can use to create an Ozone replication policy:

Table 1. API parameters for Ozone replication policy
Parameter name	Data type	Description
active	boolean	Read-only field. Shows true when the replication policy is running. Otherwise, this shows false.
alertOnAbort	boolean	Set to true to generate an alert when a replication job stops abruptly. Default is false.
alertOnFail	boolean	Set to true to generate an alert when a replication job fails. Default is false.
alertOnStart	boolean	Set to true to generate an alert when a replication job starts. Default is false.
alertOnSuccess	boolean	Set to true to generate an alert when a replication job completes successfully. Default is false.
description	string	Enter a description for the replication policy.
displayName	string	Enter a unique name for the replication policy.
endTime	string	The timestamp after which the replication job is not triggered.
abortOnError	boolean	Set to true to stop the replication job when an error appears. The files copied up to that point remain on the destination, but no additional files are copied. Default is false.
bandwidthPerMap	number	The maximum bandwidth (in MB) per mapper in the MapReduce replication job. The default value is 100 MB per second for each mapper.
destinationPath	string	Enter the path on the target cluster to which the replication policy copies the data to.
exclusionFilters	array of string	Enter one or more regular expressions separated by comma. Replication Manager does not copy the subdirectories or files from the source that matches one of the specified regular expressions to the target cluster.
logPath	string	Enter an alternate path for the logs, if required.
mapreduceServiceName	string	Enter a MapReduce or YARN service to use for the replication policy.
numMaps	number	Enter the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
removeMissingFiles	boolean	Set to true to delete destination files that are missing in source. Default is false.
replicationStrategy	Data model	Choose one of the following replication strategies to determine whether the file replication tasks should be distributed among the mappers statically or dynamically. Static distributes file replication tasks among the mappers up front to achieve an uniform distribution based on the file sizes. Dynamic distributes the file replication tasks in small sets to the mappers, and as each mapper completes its tasks, it dynamically acquires and processes the next unallocated set of tasks. The default replication strategy is Dynamic.
schedulerPoolName	string	Enter the name of a resource pool in the field. The value you enter is used by the MapReduce Service you specified when Cloudera Manager executes the MapReduce job for the replication. The job specifies the value using one of these properties: MapReduce – Fair scheduler: mapred.fairscheduler.pool MapReduce – Capacity scheduler: queue.name YARN – mapreduce.job.queuename
skipChecksumChecks	boolean	Set to true to skip checksum checks on the copied files. Checksums are checked by default.
skipListingChecksumChecks	boolean	Set to true to skip checksum check when comparing two files to determine whether they are the same or not. If skipped, the file size and last modified time are used to determine if files are the same. Skipping the check improves performance during the mapper phase. Note that if you select the Skip Checksum Checks option, this check is also skipped.
skipTrash	boolean	Permanently deletes destination files that are missing in source. Default is null.
sourcePath	string	Enter the path to the bucket on the source cluster to replicate data from.
sourceService	data model	Enter the following: clusterName - Enter the cluster name. peerName - Enter the peer name that you entered when you added the cluster as a peer. serviceName - Enter the Ozone service name in Cloudera Manager.
sourceUser	string	Enter the user name to run the replication policy. If you want to specify a user name for unsecure clusters, enter null. If you are using a kerberized cluster, enter the required user name. The replication policy uses this user name to replicate the data in the kerberized cluster.
userName	string	Enter the user name to run the replication policy. If you want to specify a username for unsecure clusters, enter null. If you are using a kerberized cluster, enter the required user name. The replication policy uses this user name to replicate the data in the kerberized cluster.
id	number	Auto-generated replication policy ID
interval	number	Enter the duration between consecutive replication policy runs. Default is 0.
intervalUnit	data model	Enter one of the following frequencies to run the replication policy: MINUTE; HOUR; DAY; WEEK; MONTH; YEAR
nextRun	string	Read-only. The timestamp for the next scheduled replication policy run.
paused	boolean	Set to true to pause a replication job from replicating after the policy creation is complete. Default is false.
startTime	string	The timestamp to initiate the replication job run.

The response body shows the created Ozone replication policy.

The following sample snippet shows the API parameters required to create an Ozone replication policy:


	"items": [{
		"active": false,
		"alertOnAbort": false,
		"alertOnFail": false,
		"alertOnStart": false,
		"alertOnSuccess": false,
		"description": null,
		"displayName": "Remote 1",
		"endTime": null,
		"ozoneReplicationArguments": {
			"abortOnError": false,
			"bandwidthPerMap": 100,
			"destinationPath": "vol1/repl1",
			"exclusionFilters": [],
			"logPath": null,
			"mapreduceServiceName": "YARN-1",
			"numMaps": 20,
			"removeMissingFiles": false,
			"replicationStrategy": "DYNAMIC",
			"schedulerPoolName": null,
			"skipChecksumChecks": false,
			"skipListingChecksumChecks": false,
			"skipTrash": false,
			"sourcePath": "vol1/repl1",
			"sourceService": {
				"clusterName": "Cluster 1",
				"peerName": "Remote Source",
				"serviceName": "OZONE-1"
			},
			"sourceUser": "testuser",
			"userName": "testuser"
		},
		"id": null,
		"interval": 0,
		"intervalUnit": "MINUTE",
		"nextRun": null,
		"paused": false,
		"startTime": null
	}]
}