Planning for the HDP Cluster
Also available as:
PDF

Typical Hadoop Cluster

Hadoop and HBase clusters have two types of machines: masters and slaves.

  • Masters -- HDFS NameNode, YARN ResourceManager, and HBase Master.

  • Slaves -- HDFS DataNodes, YARN NodeManagers, and HBase RegionServers.

    The DataNodes, NodeManagers, and HBase RegionServers are co-located or co-deployed for optimal data locality.

    In addition, HBase requires the use of a separate component (ZooKeeper) to manage the HBase cluster.

Hortonworks recommends separating master and slave nodes because:

  • Task/application workloads on the slave nodes should be isolated from the masters.

  • Slaves nodes are frequently decommissioned for maintenance.

For evaluation purposes, it is possible to deploy Hadoop using a single-node installation (all the masters and slave processes reside on the same machine).

For a small two-node cluster, the NameNode and the ResourceManager are both on the master node, with the DataNode and NodeManager on the slave node.

Clusters of three or more machines typically use a single NameNode and ResourceManager with all the other nodes as slave nodes. A High-Availability (HA) cluster would use a primary and secondary NameNode , and might also use a primary and secondary ResourceManager.

Typically, a medium-to -large Hadoop cluster consists of a two-level or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabyte Ethernet (GbE) switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch). These cluster-level switches may also interconnect with other cluster-level switches or even uplink to another level of switching infrastructure.