Typical Hadoop Cluster
Hadoop and HBase clusters have two types of machines: masters and slaves.
-
Masters -- HDFS NameNode, YARN ResourceManager, and HBase Master.
-
Slaves -- HDFS DataNodes, YARN NodeManagers, and HBase RegionServers.
The DataNodes, NodeManagers, and HBase RegionServers are co-located or co-deployed for optimal data locality.
In addition, HBase requires the use of a separate component (ZooKeeper) to manage the HBase cluster.
Hortonworks recommends separating master and slave nodes because:
-
Task/application workloads on the slave nodes should be isolated from the masters.
-
Slaves nodes are frequently decommissioned for maintenance.
For evaluation purposes, it is possible to deploy Hadoop using a single-node installation (all the masters and slave processes reside on the same machine).
For a small two-node cluster, the NameNode and the ResourceManager are both on the master node, with the DataNode and NodeManager on the slave node.
Clusters of three or more machines typically use a single NameNode and ResourceManager with all the other nodes as slave nodes. A High-Availability (HA) cluster would use a primary and secondary NameNode , and might also use a primary and secondary ResourceManager.
Typically, a medium-to -large Hadoop cluster consists of a two-level or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabyte Ethernet (GbE) switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch). These cluster-level switches may also interconnect with other cluster-level switches or even uplink to another level of switching infrastructure.