Adding a Compute Cluster and Data Context

How to create a Compute Cluster and Data Context

Minimum Required Role: Cluster Administrator (also provided by Full Administrator) This feature is not available when using Cloudera Manager to manage Data Hub clusters.

To create a Compute cluster, you must have a Regular cluster that will be designated as the Base cluster. This cluster hosts the data services to be used by a Compute cluster and can also host services for other workloads that do not require access to data services defined in the Data Context.

To create a Compute cluster:

On the Cloudera Manager home page, click Clusters > Add Cluster
The Add Cluster Welcome page displays.
Click Continue. .
The Cluster Basics page displays
Select Compute cluster.
If you already have a Data Context defined, select it from the drop-down list.
To create a new Data Context:
1. Select Create Data Context from the drop-down list.
  The Create Data Context dialog box displays.
2. Enter a unique name for the Data Context.
3. Select the Base cluster from the drop-down list.
4. Select the Data Services, Metadata Services and Security Services you want to expose in the Data Context. You can choose the following:
  - HDFS (required)
  - Hive Metadata Service
  - Atlas
  - Ranger
5. Click Create.
  The Cluster Basics page displays your selections.
6. Click Continue.
Continue with the next steps in the Add Cluster Wizard to specify hosts and credentials, and install the Agent and CDH software.
The Select Repository screen examines the CDH version of the base cluster and recommend a supported version. Cloudera recommends that your Base and Compute clusters each run the same version of CDH. The Add Cluster Wizard offers the option to choose other versions, but these combinations have not been tested and are not supported for production use.
On the Select Services screen, choose any of the pre-configured combinations of services listed on this page, or you can select Custom Services and choose the services you want to install.
Service combinations for Compute Clusters:

Data Engineering

Process develop, and serve predictive models.

Services included: Spark, Oozie, Hive on Tez, Data Analytics Studio, HDFS, YARN, and YARN Queue Manager

Spark

Spark for Compute

Services included: Core Configuration, Spark, Oozie, YARN, and YARN Queue Manager

Streams Messaging (Simple)

Simple Kafka cluster for streams messaging

Services included: Kafka, Schema Registry, and Zookeeper

Streams Messaging (Full)

Advanced Kafka cluster with monitoring and replication services for streams messaging

Services included: Kafka, Schema Registry, Streams Messaging Manager, Streams Replication Manager, Cruise Control, and Zookeeper

Custom Services

Choose your own services. Services required by chosen services will automatically be included.

The following services can be installed on a Compute cluster:
- Hive Execution Service (This service supplies the HiveServer2 role only.)
- Hue
- Kafka
- Spark
- Oozie (only when Hue is available, and is a requirement for Hue)
- YARN
- HDFS
If you have enabled Kerberos authentication on the Base cluster, you must also enable Kerberos on the Compute cluster.
If you don't select a service from Step 7 but if you want to use integrations related to these services (such as Hive, Spark, Kafka, etc.) on the Compute cluster, then you need to add all the necessary gateway roles to use the connections properly.