Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the performance and the total cost of ownership (TCO) associated with the environment. Additionally, the following composite system stack recommendations can help benefit organizations in the planning stages:
Machine Type | Workload Pattern/ Cluster Type | Storage | Processor (# of Cores) | Memory (GB) | Network |
---|---|---|---|---|---|
Slaves | Balanced workload | Four to six 2 TB disks | One Quad | 24 | 1 GB Ethernet all-to-all |
HBase cluster | Six 2 TB disks | Dual Quad | 48 | ||
Masters | Balanced and/or HBase cluster | Four to six 2 TB disks | Dual Quad | 24 |
Machine Type | Workload Pattern/ Cluster Type | Storage | Processor (# of Cores) | Memory (GB) | Network |
---|---|---|---|---|---|
Slaves | Balanced workload | Four to six 1 TB disks | Dual Quad | 24 | Dual 1 GB links for all nodes in a 20 node rack and 2 x 10 GB interconÂnect links per rack going to a pair of central switches. |
Compute intensive workload | Four to six 1 TB or 2 TB disks | Dual Hexa Quad | 24-48 | ||
I/O intensive workload | Twelve 1 TB disks | Dual Quad | 24-48 | ||
HBase clusters | Twelve 1 TB disks | Dual Hexa Quad | 48-96 | ||
Masters | All workload patterns/HBase clusters | Four to six 2 TB disks | Dual Quad | Depends on number of file system objects to be created by NameNode. |
For Further Reading
-
Best Practices for Selecting Apache Hadoop Hardware (Hortonworks blog)
-
"Installation Requirements, Hardware", HBase, The Definitive Guide by Lars George, O'Reilly 2011, Chapter 2 (page 34 ff.)
-
Hadoop Network and Compute Architecture Considerations by Jacob Rapp, Cisco (Hadoop World 2011 presentation)
-
Hadoop network design challenge (Brad Hedlund.com)
-
Scott Carey's email on smaller hardware for smaller clusters (email to general@hadoop.apache.org, Wed, 10 Aug 2011 17:24:25 GMT)
-
Failure Trends in a Large Disk Drive Population - Google Research Paper
-
HBase production deployments: