Disk Balancer commands

In addition to planning for data movement across disks and executing the plan, you can use hdfs diskbalancer sub-commands to query the status of the plan, cancel the plan, identify at a cluster level the DataNodes that require balancing, or generate a detailed report on a specific DataNode that can benefit from running the Disk Balancer.

Planning the data movement for a DataNode

Command:

hdfs diskbalancer -plan
                <datanode>

Argument	Description
`<datanode>`	Fully qualified name of the DataNode for which you want to generate the plan.

hdfs diskbalancer -plan node1.mycluster.com

The following table lists the additional options that you can use with the hdfs diskbalancer -plan command:

Option	Description
-out	Specify the location within the HDFS namespace where you want to save the output JSON documents that contain the generated plans.
-bandwidth	Specify the maximum bandwidth to use for running the Disk Balancer. This option helps in minimizing the amount of data moved by the Disk Balancer on an operational DataNode. Disk Balancer uses the default bandwidth of 10 MB/s if you do not specify this value.
-thresholdPercentage	The ideal storage value for a set of disks in a DataNode indicates the amount of data each disk should have for achieving perfect data distribution across those disks. The threshold percentage defines the value at which disks start participating in data redistribution or balancing operations. Minor imbalances are ignored because normal operations automatically correct some of these imbalances. The default value of `-thresholdPercentage` for a disk is 10%; indicating that a disk is used in balancing operations only if the disk contains 10% more or less data than the ideal storage value.
-maxerror	Specify the number of errors to ignore for a move operation between two disks before abandoning the move. Disk Balancer uses the default if you do not specify this value.
-v	Verbose mode. Specify this option for Disk Balancer to display a summary of the plan as output.
-fs	Specify the NameNode to use. Disk Balancer uses the default NameNode from the configuration if you do not specify this value.

Executing the plan

Command:

hdfs diskbalancer -execute <JSON file
                path>

Argument	Description
`<JSON file path>`	Path to the JSON document that contains the generated plan (`nodename.plan.json`).

hdfs diskbalancer -execute
            /system/diskbalancer/nodename.plan.json

Querying the current status of execution

Command:

hdfs diskbalancer -query
                <datanode>

Argument	Description
`<datanode>`	Fully qualified name of the DataNode for which the plan is running.

hdfs diskbalancer -query nodename.mycluster.com

Cancelling a running plan

Commands:

hdfs diskbalancer -cancel <JSON file path>

Argument	Description
`<JSON file path>`	Path to the JSON document that contains the generated plan (`nodename.plan.json`).

hdfs diskbalancer -cancel
            /system/diskbalancer/nodename.plan.json

hdfs diskbalancer -cancel <planID> -node
                <nodename>

Argument	Description
`planID`	ID of the plan to cancel. You can get this value from the output of the hdfs diskbalancer -query command.
`nodename`	The fully qualified name of the DataNode on which the plan is running.

Viewing detailed report of DataNodes that require Disk Balancer

Commands:

hdfs diskbalancer -fs http://namenode.uri -report
                -node <file://>

Argument	Description
`<file://>`	Hosts file listing the DataNodes for which you want to generate the reports.

hdfs diskbalancer -fs http://namenode.uri -report
                -node [<DataNodeID|IP|Hostname>,...]

Argument	Description
`[<DataNodeID\|IP\|Hostname>,...]`	Specify the DataNode ID, IP address, and the host name of the DataNode for which you want to generate the report. For multiple DataNodes, provide the details using comma-separated values.

Viewing details of the top DataNodes in a cluster that require Disk Balancer

Command:

hdfs diskbalancer -fs
                http://namenode.uri -report-node -top <topnum>

Argument	Description
`<topnum>`	The number of the top DataNodes that require Disk Balancer to be run.