Default cluster configurations

Data Hub includes a set of prescriptive cluster configurations: These configurations include cluster templates for common data analytics and data engineering use cases, and cloud-provider specific cluster definitions, which reference these cluster templates.

The following cluster definitions are available:
Cluster definition name Cluster template name Runtime version Main services included in the attached cluster template Description

Data Engineering for AWS

Data Engineering for Azure

CDP - Data Engineering: Apache Spark, Apache Hive, Apache Oozie 7.2.0 Data Analytics Studio (DAS), HDFS, Hive, Hue, Livy, Oozie, Spark, Yarn, Zeppelin, ZooKeeper Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

Data Engineering HA for AWS (Preview)

Data Engineering HA for Azure (Preview)

CDP - Data Engineering HA: Apache Spark, Apache Hive, Hue, Apache Oozie

7.2.0 Data Analytics Studio (DAS), HDFS, Hive, Hue, Livy, Oozie, Spark, Yarn, Zeppelin, ZooKeeper Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

Data Mart for AWS

Data Mart for Azure

CDP - Data Mart: Apache Impala, Hue 7.2.0 HDFS, Hue, Impala Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

Real-time Data Mart for AWS

Real-time Data Mart for Azure

CDP - Real-time Data Mart: Apache Impala, Hue, Apache Kudu, Apache Spark 7.2.0 HDFS, Hue, Impala, Kudu, Spark, Yarn Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

Operational Database with SQL for AWS

Operational Database with SQL for Azure

CDP - Operational Database: Apache HBase 7.2.0 HDFS, HBase, Knox, ZooKeeper Operational DB is a NoSQL database powered by Apache HBase designed to support custom OLTP applications that want to leverage the power of BigData. Apache HBase is a NoSQL, scale out database that can easily scale to petabytes and stores tables with millions of columns and billions of rows.

This template provides a fully capable standalone deployment of Apache HBase and supporting packages (HDFS, ZooKeeper) and can be used as a standalone DBMS to which you can point an application or as a D/R instance for an on-prem HBase cluster.

Streams Messaging Heavy Duty for AWS (Preview)

Streams Messaging Light Duty for AWS (Preview)

Streams Messaging Heavy Duty for Azure (Preview)

Streams Messaging Light Duty for Azure (Preview)

CDP - Streams Messaging Heavy Duty,

CDP - Streams Messaging Light Duty

7.2.0 Kafka, Schema Registry, SMM Streams Messaging provides advanced messaging and real-time processing on streaming data using Apache Kafka, centralized schema management using Schema Registry, as well as management and monitoring capabilities powered by Streams Messaging Manager.

This template sets up a fault-tolerant standalone deployment of Apache Kafka and supporting Cloudera components (Schema Registry and Streams Messaging Manager), which can be used for production Kafka workloads in the cloud or as a disaster recovery instance for on-prem Kafka clusters.

Flow Management: Light Duty for AWS

Flow Management: Heavy Duty for AWS

Flow Management: Light Duty for Azure

Flow Management: Heavy Duty for Azure

CDP - Flow Management 7.2.0 NiFi, NiFi Registry Flow Management delivers high-scale data ingestion, transformation, and management to enterprises from any-to-any environment. It addresses key enterprise use cases such as data movement, continuous data ingestion, log data ingestion, and acquisition of all types of streaming data including social, mobile, clickstream, and IoT data.

The Flow Management template includes a no-code data ingestion and management solution powered by Apache NiFi. With NiFi’s intuitive graphical interface and 300+ processors, Flow Management enables easy data ingestion and movement between CDP services as well as 3rd party cloud services. NiFi Registry is automatically set up and provides a central place to manage versioned Data Flows.

The default cluster definitions can be accessed from the Management Console > Environments > Shared Resources > Cluster Definitions. The default cluster templates can be accessed from Management Console > Environments > Shared Resources > Cluster Templates. To view details of a cluster definition or cluster template, click on its name. For each cluster definition, you can access a raw JSON file. For each cluster template, you can access a graphical representation ("list view") and a raw JSON file ("raw view") of all cluster host groups and their components.