The master nodes, being unique, have significantly different storage and memory requirements than the slave nodes.
Storage options
We recommend using dual NameNode servers - one primary and one secondary. Both NameNode servers should have highly reliable storage for their namespace storage and edit-log journaling. Typically, hardware RAID and/or reliable network storage are justifiable options.
The master servers should have at least four redundant storage volumes, some local and some networked, but each can be relatively small (typically 1TB).
Note | |
---|---|
The RAID disks on the master nodes are where support contracts are needed. We recommend including an on-site disk replacement option in your support contract so that a failed RAID disk can be replaced faster. |
Multiple vendors sell NAS software. It is important to check their specifications before you invest in any NAS software.
Storage options for JobTracker servers
JobTracker servers do not need the RAID storage because they save their persistent state to HDFS and the JobTracker server can actually be run on a slave node with a bit of extra RAM. However, using the same hardware specification as the NameNode server provides a plan for migrating the NameNode to the same server as the JobTracker in the case of the NameNode failure and a copy of the NameNode’s state can be saved to the network storage.
Memory sizing
The amount of memory required for the master nodes depends on the number of file system objects (files and block replicas) to be created and tracked by the NameNode. 64 GB of RAM supports approximately 100 million files. Some sites are now experimenting with 128GB of RAM, for even larger namespaces.
Processors
The NameNodes and their clients are very “chatty”. We therefore recommend providing 16 or even 24 CPU cores to handle messaging traffic for the master nodes.
Network
Providing multiple network ports and 10 GB bandwidth to the switch is also acceptable (if the switch can handle it).