Performing Disk Hot Swap for DataNodes

This section describes how to replace HDFS disks without shutting down a DataNode. This is referred to as hot swap.

Performing Disk Hot Swap for DataNodes Using Cloudera Manager

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

  1. Configure data directories to remove the disk you are swapping out:
    1. Go to the HDFS service.
    2. Click the Instances tab.
    3. In the Role Type column, click on the affected DataNode.
    4. Click the Configuration tab.
    5. Select Scope > DataNode.
    6. Select Category > Main.
    7. Change the value of the DataNode Data Directory property to remove the directories that are mount points for the disk you are removing.
  2. Click Save Changes to commit the changes.
  3. Refresh the affected DataNode. Select Actions > Refresh Data Directories.
  4. Remove the old disk and add the replacement disk.
  5. Change the value of the DataNode Data Directory property to add back the directories that are mount points for the disk you added.
  6. Click Save Changes to commit the changes.
  7. Refresh the affected DataNode. Select Actions > Refresh Data Directories.
  8. Run the HDFS fsck utility to validate the health of HDFS.

Performing Disk Hot Swap for DataNodes Using the Command Line

Use these instructions to perform hot swap of disks in a cluster that is not managed by Cloudera Manager

To add and remove disks:

  1. If you are adding disks, format and mount them.
  2. Change the value of dfs.datanode.data.dir in hdfs-site.xml on the DataNode to reflect the directories that will be used from now on (add new points and remove obsolete ones). For more information, see the instructions for DataNodes under Configuring Local Storage Directories.
  3. Start the reconfiguration process:
    • If Kerberos is enabled:
      $ kinit -kt /path/to/hdfs.keytab hdfs/<fully.qualified.domain.name@YOUR-REALM.COM> && dfsadmin -reconfig datanode HOST:PORT start
    • If Kerberos is not enabled:
      $ sudo -u hdfs hdfs dfsadmin -reconfig datanode HOST:PORT start
    where HOST:PORT is the DataNode's dfs.datanode.ipc.address (or its hostname and the port specified in dfs.datanode.ipc.address; for example dnhost1.example.com:5678)
    To check on the progress of the reconfiguration, you can use the status option of the command; for example, if Kerberos is not enabled:
    $ sudo -u hdfs hdfs dfsadmin -reconfig datanode HOST:PORT status
  4. Once the reconfiguration is complete, unmount any disks you have removed from the configuration.
  5. Run the HDFS fsck utility to validate the health of HDFS.

To perform maintenance on a disk:

  1. Change the value of dfs.datanode.data.dir in hdfs-site.xml on the DataNode to exclude the mount point directories that reside on the affected disk and reflect only the directories that will be used during the maintenance window. For more information, see the instructions for DataNodes under Configuring Local Storage Directories.
  2. Start the reconfiguration process:
    • If Kerberos is enabled:
      $ kinit -kt /path/to/hdfs.keytab hdfs/<fully.qualified.domain.name@YOUR-REALM.COM> && dfsadmin -reconfig datanode HOST:PORT start
    • If Kerberos is not enabled:
      $ sudo -u hdfs hdfs dfsadmin -reconfig datanode HOST:PORT start
    where HOST:PORT is the DataNode's dfs.datanode.ipc.address, or its hostname and the port specified in dfs.datanode.ipc.address.
    To check on the progress of the reconfiguration, you can use the status option of the command; for example, if Kerberos is not enabled:
    $ sudo -u hdfs hdfs dfsadmin -reconfig datanode HOST:PORT status
  3. Once the reconfiguration is complete, unmount the disk.
  4. Perform maintenance on the disk.
  5. Remount the disk.
  6. Change the value of dfs.datanode.data.dir again to reflect the original set of mount points.
  7. Repeat step 2.
  8. Run the HDFS fsck utility to validate the health of HDFS.