Default cluster configurations

Data Hub includes a set of prescriptive cluster configurations: These configurations include cluster templates for common data analytics and data engineering use cases, and cloud-provider specific cluster definitions, which reference these cluster templates.

The following cluster definitions are available:
Cluster template name Cluster definition name Runtime version Main services included in the attached cluster template Description
CDP 1.2 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie CDP 1.2 - Data Engineering template 7.0.2 Data Analytics Studio (DAS), HDFS, Hive, Hue, Livy, Oozie, Spark, Yarn, Zeppelin, ZooKeeper Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

CDP 1.2 - Data Mart: Apache Impala, Hue CDP 1.2 - Data Mart template 7.0.2 HDFS, Hue, Impala Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

CDP 1.2 - Operations Data Mart: Apache Impala, Hue, Apache Kudu, Apache Spark CDP 1.2 -Operations Data Mart template 7.0.2 HDFS, Hue, Impala, Kudu, Spark, Yarn Data Mart is an MPP SQL database powered by Apache Impala designed to support custom Data Mart applications at big data scale. Impala easily scales to petabytes of data, processes tables with trillions of rows, and allows users to store, browse, query, and explore their data in an interactive way.

The Data Mart template provides a ready to use, fully capable, standalone deployment of Impala. Upon deployment, it can be used as a standalone Data Mart to which users point their BI dashboards using JDBC/ODBC end points. Users can also choose to author SQL queries in Cloudera’s web-based SQL query editor, Hue, and execute them with Impala providing a delightful end-user focused and interactive SQL/BI experience.

CDP 1.2 - Operational Database: Apache HBase (Technical Preview) CDP 1.2 - Operational Database template 7.0.2 HDFS, HBase, ZooKeeper Operational DB is a NoSQL database powered by Apache HBase designed to support custom OLTP applications that want to leverage the power of BigData. Apache HBase is a NoSQL, scale out database that can easily scale to petabytes and stores tables with millions of columns and billions of rows.

This template provides a fully capable standalone deployment of Apache HBase and supporting packages (HDFS, ZooKeeper) and can be used as a standalone DBMS to which you can point an application or as a D/R instance for an on-prem HBase cluster.

The default cluster definitions can be accessed from the Management Console > Environments > Shared Resources > Cluster Definitions. The default cluster templates can be accessed from Management Console > Environments > Shared Resources > Cluster Templates.