Remove a DataNode

Before removing a DataNode, ensure that all the prerequisites for removal are satisfied.

  • The number of DataNodes in your cluster must be greater than or equal to the replication factor you have configured for HDFS. (This value is typically 3.)

    In order to satisfy this requirement, add the DataNode role on other hosts as required and start the role instances before removing any DataNodes.

  • Ensure the DataNode that is to be removed is running.

  1. Decommission the DataNode role.
    When asked to select the role instance to decommission, select the DataNode role instance.
    The decommissioning process moves the data blocks to the other available DataNodes.
  2. Stop the DataNode role.
    When asked to select the role instance to stop, select the DataNode role instance.
  3. Verify the integrity of the HDFS service.
    1. Run the following command to identify any problems in the HDFS file system:
      hdfs fsck /
    2. Fix any errors reported by the fsck command.
  4. After all errors are resolved, perform the following steps.
    1. Remove the DataNode role.
    2. Manually remove the DataNode data directories.
      You can determine the location of these directories by examining the DataNode Data Directory property in the HDFS configuration. In Cloudera Manager, go to the HDFS service, select the Configuration tab and search for the property.