Replication policy
You should take into consideration the following items when creating or modifying a replication policy.
- If using TDE for encryption, the entire source directory must be either encrypted or not encrypted, otherwise policy creation fails.
- If using an S3 cluster for your policy, your credentials must have been registered on the Cloud Credentials page.
- On destination clusters, the DLM Engine must have been granted write permissions on folders being replicated.
- Any user with access to the DLM UI has the ability to browse, within the DLM UI,
the folder structure of any clusters enabled for DLM.
Therefore, the DataPlane Admins and the Infrastructure Admins can view folders, files, and databases in the DLM UI that they might not have access to in HDFS. The DataPlane Admin, Infrastructure Admin, and DLM Admin cannot view from the DLM UI the content of files on the source or destination clusters. Nor do these administrators have the ability to modify or delete folders or files that are viewable from the DLM UI.
- Ensure that the frequency is set so that a job finishes before the next job starts. Jobs based on the same policy cannot overlap. If a job is not completed before another job starts, the second job does not execute and is given the status Skipped. If a job is consistently skipped, you might need to modify the frequency of the job.
- Specify bandwidth per map, in MBps. Each map is restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy in such a way that the net bandwidth used tends towards the specified value.
- The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.
- The clusters you want to include in the replication policy must have been paired already.
- With a single cluster, you can replicate data on-premise to cloud and vice-versa.
-
With a single cluster, you cannot replicate data on-premise to on-premise and vice-versa.
- On the Create Policy page, the only requirement for clusters to display in the Source Cluster or Destination Cluster fields is that they are DLM-enabled. You must ensure that the clusters you select are healthy before you start a policy instance (job).
- Storage handler-based tables (such as HBase) are currently not replicated.
- When creating a schedule for a Hive replication policy, you should set the frequency so that changes are replicated often enough to avoid overly large copies.