Default cluster configurations

Data Hub includes a set of prescriptive cluster configurations: These configurations include cluster templates for common data analytics and data engineering use cases, and cloud-provider specific cluster definitions, which reference these cluster templates.

The following cluster definitions are available:
Cluster definition name Cluster template name Runtime version Main services included in the attached cluster template Description
Data Engineering for AWS CDP 1.2 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie 7.0.2 Data Analytics Studio (DAS), HDFS, Hive, Hue, Livy, Oozie, Spark, Yarn, Zeppelin, ZooKeeper Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

Data Engineering Hue HA for AWS

CDP 1.2 - Data Engineering Hue HA: Apache Spark, Apache Hive, Hue, Apache Oozie

7.0.2 Data Analytics Studio (DAS), HDFS, Hive, Hue, Livy, Oozie, Spark, Yarn, Zeppelin, ZooKeeper Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

Data Mart for AWS CDP 1.2 - Data Mart: Apache Impala, Hue 7.0.2 HDFS, Hue, Impala Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

Real-time Data Mart for AWS CDP 1.2 - Real-time Data Mart: Apache Impala, Hue, Apache Kudu, Apache Spark 7.0.2 HDFS, Hue, Impala, Kudu, Spark, Yarn Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

Operational Database for AWS CDP 1.2 - Operational Database: Apache HBase 7.0.2 HDFS, HBase, ZooKeeper Operational DB is a NoSQL database powered by Apache HBase designed to support custom OLTP applications that want to leverage the power of BigData. Apache HBase is a NoSQL, scale out database that can easily scale to petabytes and stores tables with millions of columns and billions of rows.

This template provides a fully capable standalone deployment of Apache HBase and supporting packages (HDFS, ZooKeeper) and can be used as a standalone DBMS to which you can point an application or as a D/R instance for an on-prem HBase cluster.

Streams Messaging Heavy Duty for AWS,

Streams Messaging Light Duty for AWS

CDP 1.2 - Streams Messaging Heavy Duty,

CDP 1.2 - Streams Messaging Light Duty

7.0.2 Kafka, Schema Registry, Streams Messaging Manager, Zookeeper Streams Messaging provides advanced messaging and real-time processing on streaming data using Apache Kafka, centralized schema management using Schema Registry, as well as management and monitoring capabilities powered by Streams Messaging Manager.

This template sets up a fault-tolerant standalone deployment of Apache Kafka and supporting Cloudera components (Schema Registry and Streams Messaging Manager), which can be used for production Kafka workloads in the cloud or as a disaster recovery instance for on-prem Kafka clusters.

The default cluster definitions can be accessed from the Management Console > Environments > Shared Resources > Cluster Definitions. The default cluster templates can be accessed from Management Console > Environments > Shared Resources > Cluster Templates. To view details of a cluster definition or cluster template, click on its name. For each cluster definition, you can access a raw JSON file. For each cluster template, you can access a graphical representation ("list view") and a raw JSON file ("raw view") of all cluster host groups and their components.