Runtime Cluster Hosts and Role Assignments

Cluster hosts can be broadly described as master hosts, utility hosts, gateway hosts, or worker hosts.

  • Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.
  • Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and one or more Hive Metastores.
  • Gateway hosts are client access points for launching jobs in the cluster. The number of gateway hosts required varies depending on the type and size of the workloads.
  • Worker hosts primarily run DataNodes and other distributed processes such as Impalad.

The following tables describe the recommended role allocations for different cluster sizes. Note that these configurations take into account services dependencies that might not be obvious. For example, running Atlas or Ranger requires also running HBase, Kafka, Solr, and ZooKeeper. For details see Service Dependences in Cloudera Manager.

3 - 10 Worker Hosts without High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Kudu master
  • Spark History Server
  • HBase master
  • Schema Registry
One host for all Utility and Gateway roles:
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Cruise Control
  • Hive Metastore
  • HiveServer2
  • Impala Catalog Server
  • Impala StateStore
  • Hue
  • Oozie
  • Gateway configuration
  • HBase backup master
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
  • Streams Messaging Manager
  • Streams Replication Manager Service
  • ZooKeeper
3 - 10 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
  • Kafka Broker
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search)
  • Streams Replication Manager Driver
  • ZooKeeper (Recommend 3 servers total)

3 - 20 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 3:
  • Kudu master (Kudu requires an odd number of masters for HA.)
  • Spark History Server
  • JournalNode (requires dedicated disk)
  • ZooKeeper
Utility Host 1:
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Cruise Control
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
  • Streams Messaging Manager
  • Streams Replication Manager Service
Utility Host 2:
  • Hive Metastore
  • Ranger Admin server
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Gateway configuration
3 - 20 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
  • Kafka Broker (Recommend 3 brokers minimum)
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search, recommend 3 servers minimum)
  • Streams Replication Manager Driver

20 - 80 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
  • HBase master
Utility Host 1:
  • Cloudera Manager
  • Cruise Control
  • Hive Metastore
  • Ranger Admin server
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
  • Streams Messaging Manager
  • Streams Replication Manager Service
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Gateway configuration
20 - 80 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
  • Kafka Broker (Recommend 3 brokers minimum)
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search, recommend 3 servers minimum)
  • Streams Replication Manager Driver

80 - 200 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
  • HBase master
Utility Host 1:
  • Cloudera Manager
  • Cruise Control
  • Streams Messaging Manager
  • Streams Replication Manager Service
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Host Monitor
Utility Host 4:
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server
Utility Host 5:
  • Hive Metastore
  • Ranger Admin server
  • Atlas server
  • Solr server
Utility Host 6:
  • Reports Manager
Utility Host 7:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Gateway configuration
80 - 200 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommend 100 tablet servers maximum)
  • Kafka Broker (Recommend 3 brokers minimum)
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search, recommend 3 servers minimum)
  • Streams Replication Manager Driver

200 - 500 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
  • HBase master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
  • HBase master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Schema Registry
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three masters for Kudu and HBase.

Utility Host 1:
  • Cloudera Manager
  • Cruise Control
  • Streams Messaging Manager
  • Streams Replication Manager Service
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Host Monitor
Utility Host 4:
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
Utility Host 5:
  • Hive Metastore
  • Ranger Admin server
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
Utility Host6:
  • Reports Manager
Utility Host 7:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Gateway configuration
200 - 500 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommend 100 tablet servers maximum)
  • Kafka Broker (Recommend 3 brokers minimum)
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search, recommend 3 servers minimum)
  • Streams Replication Manager Driver

500 -1000 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
  • HBase master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
  • HBase master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
  • HBase master
  • Schema Registry
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Schema Registry
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three masters for Kudu and HBase.

Utility Host 1:
  • Cloudera Manager
  • Cruise Control
  • Streams Messaging Manager
  • Streams Replication Manager Service
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Host Monitor
Utility Host 4:
  • Ranger Admin, Tagsync, Usersync servers
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
Utility Host 5:
  • Hive Metastore
  • Ranger Admin server
  • Atlas server
  • Solr server (CDP-INFRA-SOLR instance to support Atlas)
Utility Host 6:
  • Reports Manager
Utility Host 7:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Gateway configuration
500 - 1000 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommend 100 tablet servers maximum)
  • Kafka Broker (Recommend 3 brokers minimum)
  • Kafka Connect
  • HBase RegionServer
  • Solr server (For Cloudera Search, recommend 3 servers minimum)
  • Streams Replication Manager Driver