Disk Balancer commands
In addition to planning for data movement across disks and executing the plan, you
can use hdfs diskbalancer
sub-commands to query the status of the plan,
cancel the plan, identify at a cluster level the DataNodes that require balancing, or
generate a detailed report on a specific DataNode that can benefit from running the Disk
Balancer.
Planning the data movement for a DataNode
Command:hdfs diskbalancer -plan
<datanode>
Argument | Description |
---|---|
<datanode> | Fully qualified name of the DataNode for which you want to generate the plan. |
hdfs diskbalancer -plan node1.mycluster.com
hdfs diskbalancer -plan
command:Option | Description |
---|---|
-out | Specify the location within the HDFS namespace where you want to save the output JSON documents that contain the generated plans. |
-bandwidth | Specify the maximum bandwidth to use for running the Disk Balancer.
This option helps in minimizing the amount of data moved by the Disk
Balancer on an operational DataNode. Disk Balancer uses the default bandwidth of 10 MB/s if you do not specify this value. |
-thresholdPercentage |
The ideal storage value for a set of disks in a DataNode indicates the amount of data each disk should have for achieving perfect data distribution across those disks. The threshold percentage defines the value at which disks start participating in data redistribution or balancing operations. Minor imbalances are ignored because normal operations automatically correct some of these imbalances. The default value of |
-maxerror | Specify the number of errors to ignore for a move operation between two
disks before abandoning the move. Disk Balancer uses the default if you do not specify this value. |
-v | Verbose mode. Specify this option for Disk Balancer to display a summary of the plan as output. |
-fs | Specify the NameNode to use. Disk Balancer uses the default NameNode from the configuration if you do not specify this value. |
Executing the plan
Command:hdfs diskbalancer -execute <JSON file
path>
Argument | Description |
---|---|
<JSON file path> | Path to the JSON document that contains the generated plan
(nodename.plan.json ). |
hdfs diskbalancer -execute
/system/diskbalancer/nodename.plan.json
Querying the current status of execution
Command:hdfs diskbalancer -query
<datanode>
Argument | Description |
---|---|
<datanode> | Fully qualified name of the DataNode for which the plan is running. |
hdfs diskbalancer -query nodename.mycluster.com
Cancelling a running plan
Commands:hdfs diskbalancer -cancel <JSON file path>
Argument | Description |
---|---|
<JSON file path> | Path to the JSON document that contains the generated plan
(nodename.plan.json ). |
hdfs diskbalancer -cancel
/system/diskbalancer/nodename.plan.json
OR
hdfs diskbalancer -cancel <planID> -node
<nodename>
Argument | Description |
---|---|
planID | ID of the plan to cancel. You can get this value from the output of the hdfs diskbalancer -query command. |
nodename | The fully qualified name of the DataNode on which the plan is running. |
Viewing detailed report of DataNodes that require Disk Balancer
Commands:hdfs diskbalancer -fs http://namenode.uri -report
-node <file://>
Argument | Description |
---|---|
<file://> | Hosts file listing the DataNodes for which you want to generate the reports. |
OR
hdfs diskbalancer -fs http://namenode.uri -report
-node [<DataNodeID|IP|Hostname>,...]
Argument | Description |
---|---|
[<DataNodeID|IP|Hostname>,...] | Specify the DataNode ID, IP address, and the host name of the DataNode for which you want to generate the report. For multiple DataNodes, provide the details using comma-separated values. |
Viewing details of the top DataNodes in a cluster that require Disk Balancer
Command:hdfs diskbalancer -fs
http://namenode.uri -report-node -top <topnum>
Argument | Description |
---|---|
<topnum> | The number of the top DataNodes that require Disk Balancer to be run. |