Handling datanode disk failure

If there is a disk failure on a datanode, you must place the node in offline mode, stop the node, replace the disk, start the node, and recommission the node to remove it from offline mode. Perform the following steps:

  1. Log in to Cloudera Manager UI
  2. Navigate to Clusters.
  3. Select the Ozone service
  4. Place the datanode in offline mode. See Placing Ozone DataNodes in offline mode.
  5. Stop the node.
  6. Replace the failed disk(s). If the new disk is mounted to a different location than the old disk, you will need to update the configurations accordingly.
    1. Go to Configurations
    2. To update the path to a Ratis storage disk, update the corresponding entry in dfs.container.ratis.datanode.storage.dir to point to the new disk’s mount point.
    3. To update the path to a data storage disk, update the corresponding entry in hdds.datanode.dir to point to the new disk’s mount point.
  7. Restart the node.
  8. Recommission the Datanode to remove it from offline mode. See Recommissioning an Ozone DataNode.