Cluster hosts can be broadly described as master hosts, utility
hosts, gateway hosts, or worker hosts.
Master hosts run Hadoop master processes such as the HDFS
NameNode and YARN Resource Manager.
Utility hosts run other cluster processes that are not master processes such as
Cloudera Manager and one or more Hive Metastores.
Gateway hosts are client access points for launching jobs in
the cluster. The number of gateway hosts required varies depending on
the type and size of the workloads.
Worker hosts primarily run DataNodes and other distributed
processes such as Impalad.
The following tables describe the recommended
role allocations for different cluster sizes. Note that these
configurations take into account services dependencies that might not be
obvious. For example, running Atlas or Ranger requires also running HBase,
Kafka, Solr, and ZooKeeper. For details see Service Dependences in Cloudera
Manager.
3 - 10 Worker Hosts without High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
YARN ResourceManager
JobHistory Server
ZooKeeper
Kudu master
Spark History Server
HBase master
Schema Registry
One host
for all Utility and Gateway roles:
Secondary NameNode
Cloudera Manager
Cloudera Manager Management Service
Cruise Control
Hive Metastore
HiveServer2
Impala Catalog Server
Impala StateStore
Hue
Oozie
Gateway configuration
HBase backup master
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Streams Messaging Manager
Streams Replication Manager Service
ZooKeeper
Knox: One KnoxGateway service on utility or gateway hosts.
3 - 10 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server
Kafka Broker
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search)
Streams Replication Manager Driver
ZooKeeper (Recommend 3 servers total)
3 - 20 Worker Hosts with High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
JobHistory Server
Kudu master
HBase master
Schema Registry
Master Host 2:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
Kudu master
HBase master
Schema Registry
Master Host 3:
Kudu master (Kudu requires an odd number of masters for
HA.)
Spark History Server
JournalNode (requires dedicated disk)
ZooKeeper
Utility Host 1:
Cloudera Manager
Cloudera Manager Management Service
Cruise Control
Hive Metastore
Impala Catalog Server
Impala StateStore
Oozie
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Streams Messaging Manager
Streams Replication Manager Service
Knox: One KnoxGateway service for HA. Instead of utility, you can also select
gateway hosts.
Utility Host 2:
Hive Metastore
Ranger Admin server
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Knox: One KnoxGateway service for HA. Instead of utility, you can also
select gateway hosts.
One or more Gateway Hosts:
Hue
HiveServer2
Gateway configuration
3 - 20 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server
Kafka Broker (Recommend 3 brokers minimum)
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search, recommend 3 servers
minimum)
Streams Replication Manager Driver
20 - 80 Worker Hosts with High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
Kudu master
HBase master
Schema Registry
Master Host 2:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
Kudu master
HBase master
Schema Registry
Master Host 3:
ZooKeeper
JournalNode
JobHistory Server
Spark History Server
Kudu master
HBase master
Utility Host 1:
Cloudera Manager
Cruise Control
Hive Metastore
Ranger Admin server
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Streams Messaging Manager
Streams Replication Manager Service
Utility Host 2:
Cloudera Manager Management Service
Hive Metastore
Impala Catalog Server
Impala StateStore
Oozie
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
One or more Gateway Hosts:
Hue
HiveServer2
Gateway configuration
Two or more Gateway Hosts:
Knox: Two KnoxGateway services, one on each of the first two Gateway Hosts for
HA.
20 - 80 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server
Kafka Broker (Recommend 3 brokers minimum)
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search, recommend 3 servers
minimum)
Streams Replication Manager Driver
80 - 200 Worker Hosts with High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
Kudu master
HBase master
Schema Registry
Master Host 2:
NameNode
JournalNode
FailoverController
YARN ResourceManager
ZooKeeper
Kudu master
HBase master
Schema Registry
Master Host 3:
ZooKeeper
JournalNode
JobHistory Server
Spark History Server
Kudu master
HBase master
Utility Host 1:
Cloudera Manager
Cruise Control
Streams Messaging Manager
Streams Replication Manager Service
Utility Host 2:
Hive Metastore
Impala Catalog Server
Impala StateStore
Oozie
Utility Host 3:
Host Monitor
Utility Host 4:
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server
Utility Host 5:
Hive Metastore
Ranger Admin server
Atlas server
Solr server
Utility Host 6:
Reports Manager
Utility Host 7:
Service Monitor
One or more Gateway Hosts:
Hue
HiveServer2
Gateway configuration
Two or more Gateway Hosts:
Knox: Two KnoxGateway services, one on each of the first two Gateway Hosts for
HA.
80 - 200 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server (Recommend 100 tablet servers
maximum)
Kafka Broker (Recommend 3 brokers minimum)
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search, recommend 3 servers
minimum)
Streams Replication Manager Driver
200 - 500 Worker Hosts with High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
JournalNode
FailoverController
ZooKeeper
Kudu master
HBase master
Master Host 2:
NameNode
JournalNode
FailoverController
ZooKeeper
Kudu master
HBase master
Master Host 3:
YARN ResourceManager
ZooKeeper
JournalNode
Kudu master
HBase master
Schema Registry
Master Host 4:
YARN ResourceManager
ZooKeeper
JournalNode
Schema Registry
Master Host 5:
JobHistory Server
Spark History Server
ZooKeeper
JournalNode
We recommend no more than three masters for Kudu and
HBase.
Utility Host 1:
Cloudera Manager
Cruise Control
Streams Messaging Manager
Streams Replication Manager Service
Utility Host 2:
Hive Metastore
Impala Catalog Server
Impala StateStore
Oozie
Utility Host 3:
Host Monitor
Utility Host 4:
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Utility Host 5:
Hive Metastore
Ranger Admin server
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Utility Host6:
Reports Manager
Utility Host 7:
Service Monitor
One or more Gateway Hosts:
Hue
HiveServer2
Gateway configuration
Two or more Gateway Hosts:
Knox: Two KnoxGateway services, one on each of the first two Gateway Hosts for
HA.
200 - 500 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server (Recommend 100 tablet servers
maximum)
Kafka Broker (Recommend 3 brokers minimum)
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search, recommend 3 servers
minimum)
Streams Replication Manager Driver
500 -1000 Worker Hosts with High Availability
Master Hosts
Utility Hosts
Gateway Hosts
Worker Hosts
Master Host 1:
NameNode
JournalNode
FailoverController
ZooKeeper
Kudu master
HBase master
Master Host 2:
NameNode
JournalNode
FailoverController
ZooKeeper
Kudu master
HBase master
Master Host 3:
YARN ResourceManager
ZooKeeper
JournalNode
Kudu master
HBase master
Schema Registry
Master Host 4:
YARN ResourceManager
ZooKeeper
JournalNode
Schema Registry
Master Host 5:
JobHistory Server
Spark History Server
ZooKeeper
JournalNode
We recommend no more than three masters for Kudu and
HBase.
Utility Host 1:
Cloudera Manager
Cruise Control
Streams Messaging Manager
Streams Replication Manager Service
Utility Host 2:
Hive Metastore
Impala Catalog Server
Impala StateStore
Oozie
Utility Host 3:
Host Monitor
Utility Host 4:
Ranger Admin, Tagsync, Usersync servers
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Utility Host 5:
Hive Metastore
Ranger Admin server
Atlas server
Solr server (CDP-INFRA-SOLR instance to support
Atlas)
Utility Host 6:
Reports Manager
Utility Host 7:
Service Monitor
One or more Gateway Hosts:
Hue
HiveServer2
Gateway configuration
Two or more Gateway Hosts:
Knox: Two KnoxGateway services, one on each of the first two Gateway Hosts for
HA.
500 - 1000 Worker Hosts:
DataNode
NodeManager
Impalad
Kudu tablet server (Recommend 100 tablet servers
maximum)
Kafka Broker (Recommend 3 brokers minimum)
Kafka Connect
HBase RegionServer
Solr server (For Cloudera Search, recommend 3 servers
minimum)