Use dfs.datanode.max.transfer.threads with HBase

You must configure the dfs.datanode.max.transfer.threads with HBase to specify the maximum number of files that a DataNode can serve at any one time.

  • A Hadoop HDFS DataNode has an upper bound on the number of files that it can serve at any one time. The upper bound is controlled by the dfs.datanode.max.transfer.threads property (the property is spelled in the code exactly as shown here). Before loading, make sure you have configured the value for dfs.datanode.max.transfer.threads in the conf/hdfs-site.xml file (by default found in /etc/hadoop/conf/hdfs-site.xml) to at least 4096 as shown here:
    <property>
      <name>dfs.datanode.max.transfer.threads</name>
      <value>4096</value>
    </property>
  • Restart HDFS after changing the value for dfs.datanode.max.transfer.threads. If the value is not set to an appropriate value, strange failures can occur and an error message about exceeding the number of transfer threads will be added to the DataNode logs. Other error messages about missing blocks are also logged, such as:
    06/12/14 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: 
    java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...