Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Scaling a cluster using HDFS federation

An HDFS federation scales a cluster horizontally by providing support for multiple independent NameNodes and namespaces, with the DataNodes available as common block storage for all the NameNodes. The support for multiple namespaces improves cluster scalability and provides isolation in a multitenanted environment.

The earlier HDFS configurations without support for federations can be constrained by a single namespace, and consequently, a single NameNode for the entire cluster. In this non-federated environment, the NameNode stores the entire file system metadata in memory. This limits the number of blocks, files, and directories supported on the file system to what can be accommodated in the memory of a single NameNode. In addition, file system operations are limited to the throughput of a single NameNode. These issues of scalability and performance are addressed through an HDFS federation.

In order to scale the name service horizontally, a federation uses multiple independent NameNodes and namespaces. The NameNodes are federated; that is, the NameNodes are independent and do not require coordination with one another. A shared pool of DataNodes is used as common storage for blocks by all the NameNodes. Each DataNode registers with all the NameNodes in the cluster. DataNodes send periodic heartbeats and block reports. They also handle commands from the NameNodes.
Note
Note
HDFS federation is not supported with Hive.