Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Plan the data movement across disks

A Disk Balancer plan identifies the amount of data that should move between the disks of a specified DataNode. The plan contains move steps, where each move step specifies the source and destination disks for data movement and the number of bytes to move.

You must set the dfs.disk.balancer.enabled property in hdfs-site.xml to true.
Run the hdfs diskbalancer -plan command by specifying the path to the DataNode.
For example, hdfs diskbalancer -plan node1.mycluster.com.
The specified command generates the following two JSON documents as output in the HDFS namespace: <nodename>.before.json that indicates the state of the cluster before running Disk Balancer, and <nodename>.plan.json that details the data movement plan for the DataNode.

By default, the JSON documents are placed in the following location: /system/diskbalancer/<Creation_Timestamp>. Here, <Creation_Timestamp> indicates the folder named according to the date and time of creation of the JSON documents.

Execute the generated plan.