Properties for configuring the Balancer

Depending on your requirements, you can configure various properties for the HDFS Balancer.

dfs.datanode.balance.max.concurrent.moves

This setting applies to both the DataNodes and the Balancer, and they must be configured with the same value. On the DataNodes, the setting limits the maximum number of concurrent block moves that a single DataNode can perform. The Balancer uses the setting to avoid scheduling too many concurrent moves on a single DataNode.

You can reconfigure without DataNode restart. Follow these steps to reconfigure a DataNode:

  1. Change the value of dfs.datanode.balance.max.concurrent.moves from the Configuration tab of the HDFS service from Cloudera Manager.

  2. Refresh the cluster.

You can use the default value of 100 as the maximum number of concurrent block moves in most of the situations. If you want to set it to a lower value, you can consider a value between 25 and 50. The recommended maximum value for this parameter is 200.

dfs.datanode.balance.bandwidthPerSec

Limits the bandwidth in each DataNode using for balancing the cluster. Changing this configuration requires restarting DataNodes.

The default is 100 MB/s.

To dynamically change balancer bandwidth, use the following command:
hdfs dfsadmin -setBalancerBandwidth <bandwidth in bytes per second>
This command changes the bandwidth without restarting the DataNodes, but the changes will be lost when the DataNodes are next restarted unless the value is also changed in the Cloudera Manager.
dfs.balancer.moverThreads

Limits the number of total concurrent moves for balancing in the entire cluster. Set this property to the number of threads in the HDFS Balancer for moving blocks. Each block move requires a thread.

The default is 1000.

dfs.balancer.max-size-to-move

With each iteration, the HDFS Balancer chooses DataNodes in pairs and moves data between the DataNode pairs. Limits the maximum size of data that the HDFS Balancer moves between a chosen DataNode pair. If you increase this configuration when the network and disk are not saturated, increases the data transfer between the DataNode pair in each iteration while the duration of an iteration remains about the same.

The default is 10GB.

dfs.balancer.getBlocks.size

Specifies the total data size of the block list returned by a getBlocks(..).

When the HDFS Balancer moves a certain amount of data between source and destination DataNodes, it repeatedly invokes the getBlocks(..) rpc to the Namenode to get lists of blocks from the source DataNode until the required amount of data is scheduled.

The default is 2GB.

dfs.balancer.getBlocks.min-block-size

Specifies the minimum block size for the blocks used to balance the cluster.

The default is 10MB.

dfs.datanode.block-pinning.enabled

Specifies if block-pinning is enabled. When you create a file, a user application can specify a list of favorable DataNodes by way of the file creation API in DistributedFileSystem. The NameNode uses its best effort, allocating blocks to the favorable DataNodes. If dfs.datanode.block-pinning.enabled is set to true, if a block replica is written to a favorable DataNode, it is “pinned” to that DataNode. The pinned replicas are not moved for cluster balancing to keep them stored in the specified favorable DataNodes. This feature is useful for block distribution aware user applications such as HBase.

The default is false.