Decommissioning and Recommissioning Hosts

Decommissioning a host decommissions and stops all roles on the host without requiring you to individually decommission the roles on each service. Decommissioning applies to only to HDFS DataNode, MapReduce TaskTracker, YARN NodeManager, and HBase RegionServer roles. If the host has other roles running on it, those roles are stopped.

After all roles on the host have been decommissioned and stopped, the host can be removed from service. You can decommission multiple hosts in parallel.

Decommissioning Hosts

Minimum Required Role: Limited Operator (also provided by Operator, Configurator, Cluster Administrator, or Full Administrator)

You cannot decommission a DataNode or a host with a DataNode if the number of DataNodes equals the replication factor (which by default is three) of any file stored in HDFS. For example, if the replication factor of any file is three, and you have three DataNodes, you cannot decommission a DataNode or a host with a DataNode. If you attempt to decommission a DataNode or a host with a DataNode in such situations, the DataNode will be decommissioned, but the decommission process will not complete. You will have to abort the decommission and recommission the DataNode.

To decommission hosts:
  1. If the host has a DataNode, perform the steps in Tuning HDFS Prior to Decommissioning DataNodes.
  2. Click the Hosts tab.
  3. Select the checkboxes next to one or more hosts.
  4. Select Actions for Selected > Hosts Decommission.

    A confirmation pop-up informs you of the roles that will be decommissioned or stopped on the hosts you have selected.

  5. Click Confirm. A Decommission Command pop-up displays that shows each step or decommission command as it is run, service by service. In the Details area, click to see the subcommands that are run for decommissioning a given service. Depending on the service, the steps may include adding the host to an "exclusions list" and refreshing the NameNode, JobTracker, or NodeManager; stopping the Balancer (if it is running); and moving data blocks or regions. Roles that do not have specific decommission actions are stopped.

    You can abort the decommission process by clicking the Abort button, but you must recommission and restart each role that has been decommissioned.

    The Commission State facet in the Filters lists displays Decommissioning while decommissioning is in progress, and Decommissioned when the decommissioning process has finished. When the process is complete, a is added in front of Decommission Command.

You cannot start roles on a decommissioned host.

Tuning HDFS Prior to Decommissioning DataNodes

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

When a DataNode is decommissioned, the NameNode ensures that every block from the DataNode will still be available across the cluster as dictated by the replication factor. This procedure involves copying blocks from the DataNode in small batches. If a DataNode has thousands of blocks, decommissioning can take several hours. Before decommissioning hosts with DataNodes, you should first tune HDFS:

  1. Run the following command to identify any problems in the HDFS file system:
    hdfs fsck / -list-corruptfileblocks -openforwrite -files -blocks -locations 2>&1 > /tmp/hdfs-fsck.txt  
  2. Fix any issues reported by the fsck command. If the command output lists corrupted files, use the fsck command to move them to the lost+found directory or delete them:
    hdfs fsck file_name -move
    or
    hdfs fsck file_name -delete
  3. Raise the heap size of the DataNodes. DataNodes should be configured with at least 4 GB heap size to allow for the increase in iterations and max streams.
    1. Go to the HDFS service page.
    2. Click the Configuration tab.
    3. Select Scope > DataNode.
    4. Select Category > Resource Management.
    5. Set the Java Heap Size of DataNode in Bytes property as recommended.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    6. Click Save Changes to commit the changes.
  4. Set the DataNode balancing bandwidth:
    1. Select Scope > DataNode.
    2. Expand the Category > Performance category.
    3. Configure the DataNode Balancing Bandwidth property to the bandwidth you have on your disks and network. You can use a value lower than this is you want to minimize the impact of decommissioning on the cluster, but the trade off is that decommissioning will take longer.
    4. Click Save Changes to commit the changes.
  5. Increase the replication work multiplier per iteration to a larger number (the default is 2, however 10 is recommended):
    1. Select Scope > NameNode.
    2. Expand the Category > Advanced category.
    3. Configure the Replication Work Multiplier Per Iteration property to a value such as 10.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    4. Click Save Changes to commit the changes.
  6. Increase the replication maximum threads and maximum replication thread hard limits:
    1. Select Scope > NameNode.
    2. Expand the Category > Advanced category.
    3. Configure the Maximum number of replication threads on a DataNode and Hard limit on the number of replication threads on a DataNode properties to 50 and 100 respectively. You can decrease the number of threads (or use the default values) to minimize the impact of decommissioning on the cluster, but the trade off is that decommissioning will take longer.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    4. Click Save Changes to commit the changes.
  7. Restart the HDFS service.

For additional tuning recommendations, see Performance Considerations.

Tuning HBase Prior to Decommissioning DataNodes

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

To increase the speed of a rolling restart of the HBase service, set the Region Mover Threads property to a higher value. This increases the number of regions that can be moved in parallel, but places additional strain on the HMaster. In most cases, Region Mover Threads should be set to 5 or lower.

Recommissioning Hosts

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

Only hosts that are decommissioned using Cloudera Manager can be recommissioned.

  1. Click the Hosts tab.
  2. Select one or more hosts to recommission.
  3. Select Actions for Selected > Recommission and Confirm. A Recommission Command pop-up displays that shows each step or recommission command as it is run. When the process is complete, a is added in front of Recommission Command. The host and roles are marked as commissioned, but the roles themselves are not restarted.

Restarting All The Roles on a Host

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

  1. Click the Hosts tab.
  2. Select one or more hosts on which to start all roles.
  3. Select Actions for Selected > Start Roles on Hosts.

Performance Considerations

Decommissioning a DataNode does not happen instantly because the process requires replication of a potentially large number of blocks. During decommissioning, the performance of your cluster may be impacted. This section describes the decommissioning process and suggests solutions for several common performance issues.

Decommissioning occurs in two steps:
  1. The Commission State of the DataNode is marked as Decommissioning and the data is replicated from this node to other available nodes. Until all blocks are replicated, the node remains in a Decommissioning state. You can view this state from the NameNode Web UI. (Go to the HDFS service and select Web UI > NameNode Web UI.)
  2. When all data blocks are replicated to other nodes, the node is marked as Decommissioned.

Decommissioning can impact performance in the following ways:

  • There must be enough disk space on the other active DataNodes for the data to be replicated. After decommissioning, the remaining active DataNodes have more blocks and therefore decommissioning these DataNodes in the future may take more time.
  • There will be increased network traffic and disk I/O while the data blocks are replicated.
  • Data balance and data locality can be affected, which can lead to a decrease in performance of any running or submitted jobs.
  • Decommissioning a large numbers of DataNodes at the same time can decrease performance.
  • If you are decommissioning a minority of the DataNodes, the speed of data reads from these nodes limits the performance of decommissioning because decommissioning maxes out network bandwidth when reading data blocks from the DataNode and spreads the bandwidth used to replicate the blocks among other DataNodes in the cluster. To avoid performance impacts in the cluster, Cloudera recommends that you only decommission a minority of the DataNodes at the same time.
  • You can decrease the bandwidth available for balancing DataNodes and the number of replication threads to decrease the performance impact of the replications, but this will cause the decommissioning process to take longer to complete. See Tuning HDFS Prior to Decommissioning DataNodes.

Cloudera recommends that you add DataNodes and decommission DataNodes in parallel, in smaller groups. For example, if the replication factor is 3, then you should add two DataNodes and decommission two DataNodes at the same time.

Troubleshooting Performance of Decommissioning

The following conditions can also impact performance when decommissioning DataNodes:
Open Files
Write operations on the DataNode do not involve the NameNode. If there are blocks associated with open files located on a DataNode, they are not relocated until the file is closed. This commonly occurs with:
  • Clusters using HBase
  • Open Flume files
  • Long running tasks
To find and close open files:
  1. Log in to the NameNode host and switch to the log directory.
    The location of this directory is configurable using the NameNode Log Directory property. By default, this directory is located at:
    /var/log/hadoop-hdfs/
  2. Run the following command to verify that the logs provide the needed information:
    grep "Is current datanode" *NAME* | head
    The sixth column of the log file shows the block id, and the message should be relevant to the DataNode decommissioning. Run the following command to view the relevant log entries:
    grep "Is current datanode" *NAME* | awk '{print $6}' | sort -u > blocks.open
  3. Run the following command to return a list of open files, their blocks, and the locations of those blocks:
    hadoop fsck / -files -blocks -locations -openforwrite 2>&1 > openfiles.out
  4. Look in the openfiles.out file created by the command for the blocks showing blocks.open. Also verify that the DataNode IP address is correct.
  5. Using the list of open file(s), perform the appropriate action to restart process to close the file.

    For example, major compaction closes all files in a region for HBase.

A block cannot be relocated because there are not enough DataNodes to satisfy the block placement policy.
For example, for a 10 node cluster, if the mapred.submit.replication is set to the default of 10 while attempting to decommission one  DataNode, there will be difficulties relocating blocks that are associated with map/reduce jobs.    This condition will lead to errors in the NameNode logs similar to the following:   
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault: Not able to place enough replicas, still in need of 3 to reach 3   
Use the following steps to find the number of files where the block replication policy is equal to or above your current cluster size:
  1. Provide a listing of open files, their blocks, the locations of those blocks by running the following command:
    hadoop fsck / -files -blocks -locations -openforwrite 2>&1 > openfiles.out
  2. Run the following command to return a list of how many files have a given replication factor:
    grep repl= openfiles.out | awk '{print $NF}' | sort | uniq -c
    For example, when the replication factor is 10 , and decommissioning one:
    egrep -B4 "repl=10" openfiles.out | grep -v '<dir>' | awk '/^\//{print $1}'
  3. Examine the paths, and decide whether to reduce the replication factor of the files, or remove them from the cluster.