Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Configuring HDFS rack awareness

The NameNode in an HDFS cluster maintains rack IDs of all the DataNodes. The NameNode uses this information about the distribution of DataNodes among various racks in the cluster to select the closer DataNodes for effective block placement during read/write operations. This concept of selecting the closer DataNodes based on their location in the cluster is termed as rack awareness. Rack awareness helps in maintaining fault tolerance in the event of a failure.

Configuring rack awareness on an HDP cluster involves creating a rack topology script, adding the script to core-site.xml, restarting HDFS, and verifying the rack awareness.