Replication policy

You should take into consideration the following items when creating or modifying a replication policy.

Data security
  • If using an S3 cluster for your policy, your credentials must have been registered on the Cloud Credentials page.
  • Any user with access to the Replication Manager user interface has the ability to browse, within the Replication Manager UI, the folder structure of any clusters enabled for Replication Manager.

    Therefore, users can view folders, files, and databases in the Replication Manager user interface, that they might not have access to in HDFS. Users cannot view from the Replication Manager UI the content of files on the source or destination clusters. Nor do these administrators have the ability to modify or delete folders or files that are viewable from the Replication Manager UI.

Policy properties and settings
  • Ensure that the frequency is set so that a job finishes before the next job starts. Jobs based on the same policy cannot overlap. If a job is not completed before another job starts, the second job does not execute and is given the status Skipped. If a job is consistently skipped, you might need to modify the frequency of the job.
  • Specify bandwidth per map, in MBps. Each map is restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy in such a way that the net bandwidth used tends towards the specified value.
Cluster requirements
  • The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.
  • The clusters you want to include in the replication policy must have been paired already.
  • With a single cluster, you can replicate data on-premise to cloud.
  • With a single cluster, you cannot replicate data on-premise to on-premise and vice-versa.

  • On the Create Policy page, the only requirement for clusters to display in the Source Cluster or Destination or Data Lake Cluster fields is that they are Replication Manager-enabled. You must ensure that the clusters you select are healthy before you start a policy instance (job).
Hive restrictions
  • Storage handler-based tables (such as HBase) are currently not replicated.
  • When creating a schedule for a Hive replication policy, you should set the frequency so that changes are replicated often enough to avoid overly large copies.