Cluster Hosts and Role Assignments

This topic describes suggested role assignments for a CDH cluster managed by Cloudera Manager. The actual assignments you choose for your deployment can vary depending on the types and volume of work loads, the services deployed in your cluster, hardware resources, configuration, and other factors.

When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles among cluster hosts (except for roles assigned to Edge hosts) based on the resources available in the hosts. You can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also change and add roles at a later time using Cloudera Manager. See Role Instances.

If your cluster uses data-at-rest encryption, see Allocating Hosts for Key Trustee Server and Key Trustee KMS.

CDH Cluster Hosts and Role Assignments

The Cluster Hosts and Role Assignments table describes allocations for the following types of hosts:
  • Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.
  • Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and the Hive Metastore.
  • Edge hosts are client access points for launching jobs in the cluster. The number of Edge hosts required varies depending on the type and size of the workloads.
  • Worker hosts primarily run DataNodes and other distributed processes such as Impalad.
Cluster Hosts and Role Assignments
Cluster Size Master Hosts Utility Hosts Edge Hosts Worker Hosts
Very Small, without High Availability
  • Up to 10 worker hosts
  • High availability not enabled
Master Host 1:
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Kudu master
One host for all Utility and Edge roles:
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog Server
  • Impala StateStore
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
3 - 10 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
Small, with High Availability
  • Up to 20 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • Kudu master (Kudu requires an odd number of masters for HA.)
Utility Host 1:
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • ZooKeeper (requires dedicated disk)
  • JournalNode (requires dedicated disk)
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
3 - 20 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
Medium, with High Availability
  • Up to 200 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Kudu master

Less than 80 hosts managed by Cloudera Manager

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Oozie
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
50 - 200 Worker nodes:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

Greater than 80 hosts managed by Cloudera Manager

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
Large, with High Availability
  • Up to 500 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
200 - 500 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)
Extra Large, with High Availability
  • Up to 1000 worker hosts
  • High availability enabled
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Edge Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
500 - 1000 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

Allocating Hosts for Key Trustee Server and Key Trustee KMS

If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee Server from other enterprise data hub (EDH) services by deploying the Key Trustee Server on dedicated hosts in a separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated hosts in the same cluster as the EDH services that require access to Key Trustee Server. This architecture helps users avoid having to restart the Key Trustee Server when restarting a cluster.

See Encrypting Data at Rest.

For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest encryption, Cloudera recommends that you enable high availability for Key Trustee Server and Key Trustee KMS.