Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the performance and the total cost of ownership (TCO) associated with the environment. Additionally, the following composite system stack recommendations can help benefit your organization in the planning stages:
Machine Type | Workload Pattern/ Cluster Type | Storage | Processor (# of Cores) | Memory (GB) | Network |
---|---|---|---|---|---|
Slaves | Balanced workload | Twelve 2-3 TB disks | 8 | 128-256 | 1 GB onboard, 2x10 GBE mezzanine/external |
Compute-intensive workload | Twelve 1-2 TB disks | 10 | 128-256 | 1 GB onboard, 2x10 GBE mezzanine/external | |
Storage-heavy workload | Twelve 4+ TB disks | 8 | 128-256 | 1 GB onboard, 2x10 GBE mezzanine/external | |
NameNode | Balanced workload | Four or more 2-3 TB RAID 10 with spares | 8 | 128-256 | 1 GB onboard, 2x10 GBE mezzanine/external |
ResourceManager | Balanced workload | Four or more 2-3 TB RAID 10 with spares | 8 | 128-256 | 1 GB onboard, 2x10 GBE mezzanine/external |
For Further Reading
Best Practices for Selecting Apache Hadoop Hardware (Hortonworks blog)
“Installation Requirements, Hardware”, HBase, The Definitive Guide by Lars George, O’Reilly 2011, Chapter 2 (page 34 ff.)
Hadoop Network and Compute Architecture Considerations by Jacob Rapp, Cisco (Hadoop World 2011 presentation)
Hadoop network design challenge (Brad Hedlund.com)
Scott Carey’s email on smaller hardware for smaller clusters (email to general@hadoop.apache.org, Wed, 10 Aug 2011 17:24:25 GMT)
Failure Trends in a Large Disk Drive Population – Google Research Paper
HBase production deployments: