Cluster Hosts and Role Assignments

This topic describes suggested role assignments for a CDH cluster managed by Cloudera Manager. The actual assignments you choose for your deployment can vary depending on the types and volume of work loads, the services deployed in your cluster, hardware resources, configuration, and other factors.

When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles among cluster hosts (except for roles assigned to Edge hosts) based on the resources available in the hosts. You can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also change and add roles at a later time using Cloudera Manager. See Role Instances.

If your cluster uses data-at-rest encryption, see Allocating Hosts for Key Trustee Server and Key Trustee KMS.

CDH Cluster Hosts and Role Assignments

The Cluster Hosts and Role Assignments table describes allocations for the following types of hosts:
  • Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.
  • Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and the Hive Metastore.
  • Edge hosts are client access points for launching jobs in the cluster. The number of Edge hosts required varies depending on the type and size of the workloads.
  • Worker hosts primarily run DataNodes and other distributed processes such as Impalad.
Cluster Hosts and Role Assignments
Cluster Size Master Hosts Utility Hosts Edge Hosts Worker Hosts
Very Small, without High Availability
  • Up to 10 worker hosts
  • High availability not enabled
Master Host 1:
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Impala StateStore
One host for all Utility and Edge roles:
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
3 - 10 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
Small, with High Availability
  • Up to 20 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Impala StateStore
One host for all Utility and Edge roles:
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
  • ZooKeeper (requires dedicated disk)
  • JournalNode (requires dedicated disk)
3 - 20 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
Medium, with High Availability
  • Up to 200 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Impala StateStore
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Catalog Server
  • Oozie
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
50 - 200 Worker nodes:
  • DataNode
  • NodeManager
  • Impalad
Large, with High Availability
  • Up to 500 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Impala StateStore
  • ZooKeeper
  • JournalNode
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Catalog Server
  • Oozie
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
200 - 500 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
Extra Large, with High Availability
  • Up to 1000 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Impala StateStore
  • ZooKeeper
  • JournalNode
Utility Host 1:
  • Cloudera Manager Server
Utility Host 2:
  • Service Monitor
Utility Host 3:
  • Reports Manager
Utility Host 4:
  • Cloudera Management Service Roles: (Host Monitor, Navigator, Alert Publisher)
Utility Host 5:
  • Hive Metastore
  • Catalog Server
  • Oozie
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
500 - 1000 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad

Allocating Hosts for Key Trustee Server and Key Trustee KMS

If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee Server from other enterprise data hub (EDH) services by deploying the Key Trustee Server on dedicated hosts in a separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated hosts in the same cluster as the EDH services that require access to Key Trustee Server. This architecture helps users avoid having to restart the Key Trustee Server when restarting a cluster.

See Deployment Planning for Data at Rest Encryption.

For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest encryption, Cloudera recommends that you enable high availability for Key Trustee Server and Key Trustee KMS.