Decommissioning and Recommissioning Hosts

Decommissioning a host decommissions and stops all roles on the host without having to go to each service and individually decommission the roles. Decommissioning applies to only to HDFS DataNode, MapReduce TaskTracker, YARN NodeManager, and HBase RegionServer roles. If the host has other roles running on it, those roles are stopped.

Once all roles on the host have been decommissioned and stopped, the host can be removed from service. You can decommission multiple hosts in parallel.

Decommissioning Hosts

Minimum Required Role: Limited Operator (also provided by Operator, Configurator, Cluster Administrator, or Full Administrator)

You cannot decommission a DataNode or a host with a DataNode if the number of DataNodes equals the replication factor (which by default is three) of any file stored in HDFS. For example, if the replication factor of any file is three, and you have three DataNodes, you cannot decommission a DataNode or a host with a DataNode.

To decommission hosts:
  1. If the host has a DataNode, perform the steps in Tuning HDFS Prior to Decommissioning DataNodes.
  2. Click the Hosts tab.
  3. Check the checkboxes next to one or more hosts.
  4. Select Actions for Selected > Decommission.

A confirmation pop-up informs you of the roles that will be decommissioned or stopped on the hosts you have selected. To proceed with the decommissioning, click Confirm.

A Command Details window appears that will show each stop or decommission command as it is run, service by service. You can click one of the decommission links to see the subcommands that are run for decommissioning a given role. Depending on the role, the steps may include adding the host to an "exclusions list" and refreshing the NameNode, JobTracker, or NodeManager, stopping the Balancer (if it is running), and moving data blocks or regions. Roles that do not have specific decommission actions are stopped.

While decommissioning is in progress, the host displays the icon. Once all roles have been decommissioned or stopped, the host displays the icon. If one host in a cluster has been decommissioned, the DECOMMISSIONED facet displays in the Filters on the Hosts page and you can filter the hosts according to their decommission status.

You cannot start roles on a decommissioned host.

Tuning HDFS Prior to Decommissioning DataNodes

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

When a DataNode is decommissioned, the NameNode ensures that every block from the DataNode will still be available across the cluster as dictated by the replication factor. This procedure involves copying blocks off the DataNode in small batches. In cases where a DataNode has thousands of blocks, decommissioning can take several hours. Before decommissioning hosts with DataNodes, you should first tune HDFS:

  1. Raise the heap size of the DataNodes. DataNodes should be configured with at least 4 GB heap size to allow for the increase in iterations and max streams.
    1. Go to the HDFS service page.
    2. Click the Configuration tab.
    3. Under each DataNode role group (DataNode Default Group and any additional DataNode role groups) go to the Resource Management category, and set the Java Heap Size of DataNode in Bytes property as recommended.
    4. Click Save Changes to commit the changes.
  2. Set the DataNode balancing bandwidth:
    1. Expand the DataNode Default Group > Performance category.
    2. Configure the DataNode Balancing Bandwidth property to the bandwidth you have on your disks and network.
    3. Click Save Changes to commit the changes.
  3. Increase the replication work multiplier per iteration to a larger number (the default is 2, however 10 is recommended):
    1. Expand the NameNode Default Group > Advanced category.
    2. Configure the Replication Work Multiplier Per Iteration property to a value such as 10.
    3. Click Save Changes to commit the changes.
  4. Increase the replication maximum threads and maximum replication thread hard limits:
    1. Expand the NameNode Default Group > Advanced category.
    2. Configure the Maximum number of replication threads on a Datanode and Hard limit on the number of replication threads on a Datanode properties to 50 and 100 respectively.
    3. Click Save Changes to commit the changes.
  5. Restart the HDFS service.

Recommissioning Hosts

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

Only hosts that are decommissioned using Cloudera Manager can be recommissioned.

  1. Click the Hosts tab.
  2. Select one or more hosts to recommission.
  3. Select Actions for Selected > Recommission.

The icon is removed from the host and from the roles that reside on the host. However, the roles themselves are not restarted.

Restarting All The Roles on a Recommissioned Host

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

  1. Click the Hosts tab.
  2. Select one or more hosts on which to start recommissioned roles.
  3. Select Actions for Selected > Start All Roles.