5.1. Hardware for Slave Nodes

The following recommendations are based on Hortonworks’ experience in production data centers:

Server Platform

Typically, dual-socket servers are optimal for Hadoop deployments. For medium to large clusters, using these servers is a best choice over entry-level servers, because of their load-balancing and parallelization capabilities.

Storage Options

Your disk drives should have good MTBF numbers, as slave nodes in Hadoop suffer routine probabilistic failures.

SFF disks are being adopted in some configurations for better disk bandwidth. Hortonworks strongly recommends using either SATA or SAS interconnects.

If you have a large number of disks per server, we recommend using two disk controllers, so that the I/O load can be shared across multiple cores.

	Note
	We do not recommend using RAID on Hadoop slave machines. Hadoop assumes probabilistic disk failure and orchestrates data redundancy across all the slave nodes.

Memory

Memory can be provisioned at commodity prices on low-end server motherboards. Extra RAM will be consumed either by your Hadoop applications (typically when more processes are run in parallel) or by the intrastructure used for caching disk data to improve performance.

To retain the option of adding more memory to your servers in the future, ensure that you have space to do this alongside the initial memory modules.

Power Considerations

Power is a major concern when designing Hadoop clusters. Instead of automatically purchasing the biggest and fastest nodes, analyze the power utilization for your existing hardware. We have observed huge savings in pricing and power by avoiding fastest CPUs, redundant power supplies, etc.

For slave nodes, a single power supply unit (PSU) is sufficient, but for master servers use redundant PSUs. Server designs that share PSUs across adjacent servers can offer increased reliability without increased cost.

Machines for cloud data centers are designed to reduce cost and power, and are lightweight. If you are purchasing in large volume, we recommend evaluating these stripped-down "cloud servers".

Network

This is the most challenging parameter to estimate because Hadoop workloads vary a lot. The key is buying enough network capacity at reasonable cost so that all nodes in the cluster can communicate with each other at reasonable speeds.

Design the network so that you retain the option of adding more racks of Hadoop/HBase servers.

Minimize congestion at critical points in the network under realistic loads. Generally accepted oversubscription ratios are around 4:1 at the server access layer, and 2:1 between the access layer and the aggregation layer or core. Lower oversubscription ratios can be considered if higher performance is required.

Configure dedicated switches for the cluster, instead of trying to allocate a virtual circuit in existing switches. The load of a Hadoop cluster would impact the rest of the users of the switch.

"Deep buffering" is preferable to low-latency in switches. Enabling Jumbo Frames across the cluster can improve bandwidth, and might also provide packet integrity.

Legal notices