Creating Ozone replication policies using Cloudera Manager APIs

You can create Ozone replication policies using Cloudera Manager APIs on the target cluster.

Consider the following points before you create Ozone replication policies:
  • Data is replicated at bucket-level. Therefore, use [***volume***]/[***bucket***] format to point to the required buckets during replication policy creation.

  • Ozone replication policies perform incremental replication using file checksums and is supported by all the bucket types except OBS buckets.

  1. Go to the Cloudera Manager > Support > API Explorer page on the target cluster.
  2. Go to the ReplicationsResource API resource.
  3. Enter the following parameters to create an Ozone replication policy in the POST /clusters/{clusterName}/services/{serviceName}/replications HTTP method:
    1. Enter the cluster name in the clusterName field. For example, Cluster 1.
    2. Enter the service name in the serviceName field. For example, OZONE-1.
    3. Enter one or more of the API parameters in the API parameters for Ozone replication policy table in the body field.

    The following table lists the available API parameters that you can use to create an Ozone replication policy:

    Table 1. API parameters for Ozone replication policy
    Parameter name Data type Description
    active boolean Read-only field. Shows true when the replication policy is running. Otherwise, this shows false.
    alertOnAbort boolean Set to true to generate an alert when a replication job stops abruptly. Default is false.
    alertOnFail boolean Set to true to generate an alert when a replication job fails. Default is false.
    alertOnStart boolean Set to true to generate an alert when a replication job starts. Default is false.
    alertOnSuccess boolean Set to true to generate an alert when a replication job completes successfully. Default is false.
    description string Enter a description for the replication policy.
    displayName string Enter a unique name for the replication policy.
    endTime string The timestamp after which the replication job is not triggered.
    abortOnError boolean Set to true to stop the replication job when an error appears. The files copied up to that point remain on the destination, but no additional files are copied. Default is false.
    bandwidthPerMap number The maximum bandwidth (in MB) per mapper in the MapReduce replication job.

    The default value is 100 MB per second for each mapper.

    destinationPath string Enter the path on the target cluster to which the replication policy copies the data to.
    exclusionFilters array of string Enter one or more regular expressions separated by comma. Replication Manager does not copy the subdirectories or files from the source that matches one of the specified regular expressions to the target cluster.
    logPath string Enter an alternate path for the logs, if required.
    mapreduceServiceName string Enter a MapReduce or YARN service to use for the replication policy.
    numMaps number Enter the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
    removeMissingFiles boolean Set to true to delete destination files that are missing in source. Default is false.
    replicationStrategy Data model Choose one of the following replication strategies to determine whether the file replication tasks should be distributed among the mappers statically or dynamically.
    • Static distributes file replication tasks among the mappers up front to achieve an uniform distribution based on the file sizes.
    • Dynamic distributes the file replication tasks in small sets to the mappers, and as each mapper completes its tasks, it dynamically acquires and processes the next unallocated set of tasks.

    The default replication strategy is Dynamic.

    schedulerPoolName string

    Enter the name of a resource pool in the field. The value you enter is used by the MapReduce Service you specified when Cloudera Manager executes the MapReduce job for the replication. The job specifies the value using one of these properties:

    • MapReduce – Fair scheduler: mapred.fairscheduler.pool
    • MapReduce – Capacity scheduler: queue.name
    • YARN – mapreduce.job.queuename
    skipChecksumChecks boolean Set to true to skip checksum checks on the copied files. Checksums are checked by default.
    skipListingChecksumChecks boolean Set to true to skip checksum check when comparing two files to determine whether they are the same or not. If skipped, the file size and last modified time are used to determine if files are the same. Skipping the check improves performance during the mapper phase. Note that if you select the Skip Checksum Checks option, this check is also skipped.
    skipTrash boolean Permanently deletes destination files that are missing in source. Default is null.
    sourcePath string Enter the path to the bucket on the source cluster to replicate data from.
    sourceService data model

    Enter the following:

    • clusterName - Enter the cluster name.
    • peerName - Enter the peer name that you entered when you added the cluster as a peer.
    • serviceName - Enter the Ozone service name in Cloudera Manager.
    sourceUser string

    Enter the user name to run the replication policy. If you want to specify a user name for unsecure clusters, enter null.

    If you are using a kerberized cluster, enter the required user name. The replication policy uses this user name to replicate the data in the kerberized cluster.

    userName string

    Enter the user name to run the replication policy. If you want to specify a username for unsecure clusters, enter null.

    If you are using a kerberized cluster, enter the required user name. The replication policy uses this user name to replicate the data in the kerberized cluster.

    id number Auto-generated replication policy ID
    interval number Enter the duration between consecutive replication policy runs. Default is 0.
    intervalUnit data model Enter one of the following frequencies to run the replication policy:

    MINUTE; HOUR; DAY; WEEK; MONTH; YEAR

    nextRun string Read-only. The timestamp for the next scheduled replication policy run.
    paused boolean Set to true to pause a replication job from replicating after the policy creation is complete. Default is false.
    startTime string The timestamp to initiate the replication job run.
The response body shows the created Ozone replication policy.
The following sample snippet shows the API parameters required to create an Ozone replication policy:

	"items": [{
		"active": false,
		"alertOnAbort": false,
		"alertOnFail": false,
		"alertOnStart": false,
		"alertOnSuccess": false,
		"description": null,
		"displayName": "Remote 1",
		"endTime": null,
		"ozoneReplicationArguments": {
			"abortOnError": false,
			"bandwidthPerMap": 100,
			"destinationPath": "vol1/repl1",
			"exclusionFilters": [],
			"logPath": null,
			"mapreduceServiceName": "YARN-1",
			"numMaps": 20,
			"removeMissingFiles": false,
			"replicationStrategy": "DYNAMIC",
			"schedulerPoolName": null,
			"skipChecksumChecks": false,
			"skipListingChecksumChecks": false,
			"skipTrash": false,
			"sourcePath": "vol1/repl1",
			"sourceService": {
				"clusterName": "Cluster 1",
				"peerName": "Remote Source",
				"serviceName": "OZONE-1"
			},
			"sourceUser": "testuser",
			"userName": "testuser"
		},
		"id": null,
		"interval": 0,
		"intervalUnit": "MINUTE",
		"nextRun": null,
		"paused": false,
		"startTime": null
	}]
}