Decommissioning and Recommissioning Hosts

Decommissioning a host decommissions and stops all roles on the host without requiring you to individually decommission the roles on each service. Decommissioning applies to only to HDFS DataNode, MapReduce TaskTracker, YARN NodeManager, and HBase RegionServer roles. If the host has other roles running on it, those roles are stopped.

After all roles on the host have been decommissioned and stopped, the host can be removed from service. You can decommission multiple hosts in parallel.

Decommissioning Hosts

Minimum Required Role: Limited Operator (also provided by Operator, Configurator, Cluster Administrator, or Full Administrator)

You cannot decommission a DataNode or a host with a DataNode if the number of DataNodes equals the replication factor (which by default is three) of any file stored in HDFS. For example, if the replication factor of any file is three, and you have three DataNodes, you cannot decommission a DataNode or a host with a DataNode. If you attempt to decommission a DataNode or a host with a DataNode in such situations, the DataNode will be decommissioned, but the decommission process will not complete. You will have to abort the decommission and recommission the DataNode.

To decommission hosts:
  1. If the host has a DataNode, perform the steps in Tuning HDFS Prior to Decommissioning DataNodes.
  2. Click the Hosts tab.
  3. Select the checkboxes next to one or more hosts.
  4. Select Actions for Selected > Hosts Decommission.

    A confirmation pop-up informs you of the roles that will be decommissioned or stopped on the hosts you have selected.

  5. Click Confirm. A Decommission Command pop-up displays that shows each step or decommission command as it is run, service by service. In the Details area, click to see the subcommands that are run for decommissioning a given service. Depending on the service, the steps may include adding the host to an "exclusions list" and refreshing the NameNode, JobTracker, or NodeManager; stopping the Balancer (if it is running); and moving data blocks or regions. Roles that do not have specific decommission actions are stopped.

    You can abort the decommission process by clicking the Abort button, but you must recommission and restart each role that has been decommissioned.

    The Commission State facet in the Filters lists displays Decommissioning while decommissioning is in progress, and Decommissioned when the decommissioning process has finished. When the process is complete, a is added in front of Decommission Command.

You cannot start roles on a decommissioned host.

Tuning HDFS Prior to Decommissioning DataNodes

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

When a DataNode is decommissioned, the NameNode ensures that every block from the DataNode will still be available across the cluster as dictated by the replication factor. This procedure involves copying blocks from the DataNode in small batches. If a DataNode has thousands of blocks, decommissioning can take several hours. Before decommissioning hosts with DataNodes, you should first tune HDFS:

  1. Raise the heap size of the DataNodes. DataNodes should be configured with at least 4 GB heap size to allow for the increase in iterations and max streams.
    1. Go to the HDFS service page.
    2. Click the Configuration tab.
    3. Select Scope > DataNode.
    4. Select Category > Resource Management.
    5. Set the Java Heap Size of DataNode in Bytes property as recommended.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    6. Click Save Changes to commit the changes.
  2. Set the DataNode balancing bandwidth:
    1. Select Scope > DataNode.
    2. Expand the Category > Performance category.
    3. Configure the DataNode Balancing Bandwidth property to the bandwidth you have on your disks and network.
    4. Click Save Changes to commit the changes.
  3. Increase the replication work multiplier per iteration to a larger number (the default is 2, however 10 is recommended):
    1. Select Scope > NameNode.
    2. Expand the Category > Advanced category.
    3. Configure the Replication Work Multiplier Per Iteration property to a value such as 10.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    4. Click Save Changes to commit the changes.
  4. Increase the replication maximum threads and maximum replication thread hard limits:
    1. Select Scope > NameNode.
    2. Expand the Category > Advanced category.
    3. Configure the Maximum number of replication threads on a DataNode and Hard limit on the number of replication threads on a DataNode properties to 50 and 100 respectively.

      If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    4. Click Save Changes to commit the changes.
  5. Restart the HDFS service.

Tuning HBase Prior to Decommissioning DataNodes

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

To increase the speed of a rolling restart of the HBase service, set the Region Mover Threads property to a higher value. This increases the number of regions that can be moved in parallel, but places additional strain on the HMaster. In most cases, Region Mover Threads should be set to 5 or lower.

Recommissioning Hosts

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

Only hosts that are decommissioned using Cloudera Manager can be recommissioned.

  1. Click the Hosts tab.
  2. Select one or more hosts to recommission.
  3. Select Actions for Selected > Recommission and Confirm. A Recommission Command pop-up displays that shows each step or recommission command as it is run. When the process is complete, a is added in front of Recommission Command. The host and roles are marked as commissioned, but the roles themselves are not restarted.

Restarting All The Roles on a Host

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

  1. Click the Hosts tab.
  2. Select one or more hosts on which to start all roles.
  3. Select Actions for Selected > Start Roles on Hosts.