Data Hub HA

The Cloudera Data Hub service allows you to create workload clusters to run different components like Spark, Kafka, HBase, Impala, Hive, Nifi and so on.

You can create a cluster from a predefined or a custom cluster template. A cluster template is a declarative definition of a cluster that defines the cluster topology which includes the cluster host groups, and all the cluster services and their components running on them. Data Hub provides default cluster templates, and it also allows you to upload your own cluster templates (also called custom cluster template).

In production setups, Cloudera recommends that you use templates that are marked High Availability. You can identify these templates by the term HA in the template name.

Some Data Hubs and all the HA templates use external databases. Multi-AZ templates and custom templates create databases in a HA multi-AZ setup.

The following table lists the Data Hub templates with its corresponding components and services that support HA:

Data Hub template Component Services
Enterprise SDX template with Atlas, HMS, Ranger and other services they are dependent on. Services like HDFS, HBASE, RANGER, HMS have HA. Hive hive-HIVEMETASTORE
Ranger
  • ranger-RANGER_ADMIN
  • RAZ
HDFS
  • hdfs-NAMENODE
  • hdfs-FAILOVERCONTROLLER
  • hdfs-DATANODE
  • hdfs-JOURNALNODE
Atlas atlas-ATLAS_SERVER
Zookeeper zookeeper-SERVER
Knox
  • knox-KNOX_GATEWAY
  • knox-IDBROKER
Kafka
  • kafka-KAFKA_BROKER
Solr
  • solr-SOLR_SERVER
HBase
  • hbase-REGIONSERVER
  • hbase-MASTER
Data Engineering HA HDFS
  • hdfs-DATANODE
  • hdfs-FAILOVERCONTROLLER
  • hdfs-HTTPFS
  • hdfs-JOURNALNODE
  • hdfs-NAMENODE
Zookeeper zookeeper-SERVER
Hive
  • hive-HIVESERVER2
  • hms-HIVEMETASTORE
  • hue-HUE_LOAD_BALANCER
  • hue-HUE_SERVER
Spark
  • livy-LIVY_SERVER
  • spark_on_yarn-SPARK_YARN_HISTORY_SERVER
YARN
  • yarn-NODEMANAGER
  • yarn-NODEMANAGER-COMPUTE
  • yarn-RESOURCEMANAGER
Oozie oozie-OOZIE_SERVER
Knox knox-KNOX
Data Engineering - Spark3 HA HDFS
  • hdfs-DATANODE
  • hdfs-FAILOVERCONTROLLER
  • hdfs-HTTPFS
  • hdfs-JOURNALNODE
  • hdfs-NAMENODE
Zookeeper zookeeper-SERVER
Hive
  • hive-HIVESERVER2
  • hms-HIVEMETASTORE
  • hue-HUE_LOAD_BALANCER
  • hue-HUE_SERVER
Spark
  • livy-LIVY_SERVER
  • spark_on_yarn-SPARK_YARN_HISTORY_SERVER
YARN
  • yarn-NODEMANAGER
  • yarn-NODEMANAGER-COMPUTE
  • yarn-RESOURCEMANAGER
Oozie oozie-OOZIE_SERVER
Knox knox-KNOX
Spark
  • livy_for_spark3-LIVY_SERVER
  • spark3_on_yarn-SPARK_YARN_HISTORY_SERVER
Streams Messaging High Availability with Apache Kafka, Schema Registry, Streams Messaging Manager, Streams Replication Manager, and Cruise Control Streaming
  • schemaregistry-SCHEMA_REGISTRY_SERVER (used for topic schema management for governance)
  • kafka-KAFKA_BROKER
  • streams_replication_manager-STREAMS_REPLICATION_MANAGER_SERVICE
  • streams_replication_manager-STREAMS_REPLICATION_MANAGER_DRIVER
  • kafka-KAFKA_CONNECT (connector for external database/NiFi to Kafka)
  • kafka-KAFKA_KRAFT (in technical preview) replaces zk for kafka operation)
Zookeeper zookeeper-SERVER
Knox knox-KNOX (used for schema registry and Replication Manager)
Real-time Data Mart - Apache Impala, Hue, Apache Kudu,and Apache Spark -
  • kudu-MASTER
  • yarn-NODEMANAGER
  • impala-IMPALAD-EXECUTOR
  • kudu-TSERVER
Flow Management Heavy Duty with Apache NiFi, Apache NiFi Registry, and Schema Registry NiFi nifi-NIFI_NODE
Streaming Analytics Heavy Duty with Apache Flink -
  • zookeeper-SERVER
  • hdfs-FAILOVERCONTROLLER
  • hdfs-JOURNALNODE
  • hdfs-NAMENODE
  • kafka-KAFKA_BROKER
  • yarn-RESOURCEMANAGER
  • hdfs-DATANODE
  • yarn-NODEMANAGER
Data Mart with Apache Impala and Hue Impala
  • impala-IMPALAD-EXECUTOR