Create a replication policy
You must create a policy to assign the rules for the replication job (instance of a policy) that you want to execute. You can set rules such as the type of data to replicate, the time and frequency of replication, the bandwidth allowed for a job, and so forth. During replication, data and associated file metadata or table structures or schemas are also replicated.
- You cannot modify a policy after it is created. To change a policy, you must create a new policy with the new settings.
- DLM does not support update of any cluster endpoints (HDFS, Hive, Ranger, or DLM Engine). If an endpoint must be modified, contact Hortonworks support for assistance.
- The first time you execute a job with data that has not been previously replicated,
Data Lifecycle Manager creates a new folder or database and bootstraps the data.Important
During a bootstrap operation, all data is replicated from the source cluster to the destination. As a result, the initial execution of a job can take a significant amount of time, depending on how much data is being replicated, network bandwidth, and so forth.
After initial bootstrap, data replication is performed incrementally, so only updated data is transferred. Data is in a consistent state only after incremental replication has captured any new changes that occurred during bootstrap.
- You must use the DLM Infrastructure Admin role to perform this task.
- If using an S3 cluster for your policy, your credentials must have been registered on the Cloud Credentials page.
- The clusters you want to include in the replication policy must have been paired already.
- You must ensure that the clusters you select are healthy before you start a policy instance (job).
- On destination clusters, the DLM Engine must have been granted write permissions on folders being replicated.
- The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.
Verify that the replication job is running as intended.