High-level Design and Best PracticesPDF version

Right-size server configuration guidelines

Cloudera recommends deploying up to four machine types into your production environment: master nodes, worker nodes, utility nodes, and edge nodes.

Master nodes
Runs the Hadoop master daemons such as NameNode, Standby NameNode, YARN Resource Manager and History Server, the HBase Master daemon, Ranger server, Atlas server, and the Impala StateStore server and Catalog server. Master nodes are also the location where Zookeeper and JournalNodes are installed. The daemons can share a single pool of servers. Depending on the cluster size, the roles can be run on a dedicated server. Kudu Master servers should also be deployed on master nodes.
Worker nodes
Runs the HDFS DataNode, YARN NodeManager, HBase RegionServer, Impala impalad, Search worker daemons and Kudu Tablet Servers on the worker nodes.
Utility nodes
Runs Cloudera Manager and the Cloudera Management Services. It can also host a MariaDB or another supported database instance, which is used by Cloudera Manager, Hive, Ranger, and other Hadoop-related projects.
Edge nodes
Contains all client-facing configurations and services, including gateway configurations for HDFS, YARN, Impala, Hive, and HBase. The edge node is also a good place for Hue, Oozie, HiveServer2, and Impala HAProxy. HiveServer2 and Impala HAProxy serve as a gateway to external applications such as the business intelligence (BI) tools. Edge nodes are also known as Gateway nodes.

For more information on sizing clusters and role layout, see Runtime Cluster Hosts and Role Assignments.