HDFS replication policy considerations 
    Before you create an HDFS replication policy, you must understand how source data is
        affected when you add or delete source data during replication, the network latency issues,
        the performance and scalability limitations, the snapshot diff-based replication guidelines,
        and how to bypass Sentry ACLs during replication.
How HDFS replication policy works  Replication Manager replicates HDFS data depending on the "Source Path" and     "Destination Path" you specify in the "Create HDFS Replication Policy" wizard. Additionally, you     must follow a few guidelines to maintain the source data for successful data     replication.Improve network latency during replication job run  High latency among clusters can cause replication jobs to run more slowly, but does not     cause them to fail.Performance and scalability limitations to consider for replication policies  Before you create an HDFS replication policy, you must consider a few performance and     scalability limitations.Guidelines to use snapshot diff-based replication  By default, Replication Manager uses snapshot differences ("diff") to improve         performance by comparing HDFS snapshots and only replicating the files that are changed in         the source directory. While Hive metadata requires a full replication, the data stored in         Hive tables can take advantage of snapshot diff-based replication. HDFS replication in Sentry-enabled clusters  When you run an HDFS replication policy on a Sentry-enabled source cluster, the     replication policy copies files and tables along with their permissions. Cloudera Manager     version 6.3.1 and above is required to run HDFS replication policies on a Sentry-enabled source     cluster. Specifying hosts to improve HDFS replication policy performance  If your cluster has clients installed on hosts with limited resources, HDFS replication     may use these hosts to run commands for the replication, which can cause performance     degradation. You can limit HDFS replication to run only on selected DataNodes by specifying a     "whitelist" of DataNode hosts.