Specifying hosts to improve HDFS replication policy performance

If your cluster has clients installed on hosts with limited resources, HDFS replication may use these hosts to run commands for the replication, which can cause performance degradation. You can limit HDFS replication to run only on selected DataNodes by specifying a "whitelist" of DataNode hosts.

  1. Go to the Cloudera Manager > Clusters > HDFS service > Configuration tab.
  2. Locate the HDFS Replication Environment Advanced Configuration Snippet (Safety Valve) property.
  3. Add the HOST_WHITELIST property, and enter a comma-separated list of hostnames to use for HDFS replication policies.
    For example,
    HOST_WHITELIST=host-1.mycompany.com,host-2.mycompany.com
  4. Click Save Changes.