1. Typical Hadoop Cluster

Hadoop and HBase clusters have two types of machines: masters (the HDFS NameNode, the MapReduce JobTracker, and the HBase Master) and slaves (the HDFS DataNodes, the MapReduce TaskTrackers, and the HBase RegionServers). The DataNodes, TaskTrackers, and HBase RegionServers are co-located or co-deployed for optimal data locality. In addition, HBase requires the use of a separate component (ZooKeeper) to manage the HBase cluster.

Hortonworks recommends separating master and slave nodes because of the following reasons:

Task workloads on the slave nodes should be isolated from the masters.
Slaves nodes are frequently decommissioned for maintainance.

For evaluation purpose, you can also choose to deploy Hadoop using single-node installation (all the masters and the slave processes reside on the same machine). Setting up a small cluster (of two nodes) is a very straightforward task - one node acts as both NameNode/JobTracker and the other node acts as DataNode and TaskTracker. Clusters of three or more machines typically use a dedicated NameNode/JobTracker and all the other nodes act as the slave nodes. Typically, medium to large Hadoop cluster consists of a two or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabit Ethernet (GbE) switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch). These cluster-level switches may also interconnect with other cluster-level switches or even uplink to another level of switching infrastructure.

Legal notices