Data Engineering clusters

Learn about the default Data Engineering clusters, including cluster definition and template names, included services, and compatible Runtime version.

Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.

Data Engineering cluster definition

This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.

Cluster definition names
  • Data Engineering for AWS

  • Data Engineering for Azure

  • Data Engineering for GCP
  • Data Engineering HA for AWS (Preview)

  • Data Engineering HA for Azure (Preview)

  • Data Engineering HA for GCP (Preview)
  • Data Engineering Spark3 for AWS

  • Data Engineering Spark3 for Azure

  • Data Engineering Spark3 for GCP
Cluster template name
  • CDP - Data Engineering: Apache Spark, Apache Hive, Apache Oozie

  • CDP - Data Engineering HA: Apache Spark, Apache Hive, Hue, Apache Oozie

  • CDP - Data Engineering: Apache Spark3
Included services
  • Data Analytics Studio (DAS)
  • HDFS
  • Hive
  • Hue
  • Livy
  • Oozie
  • Spark
  • Yarn
  • Zeppelin
  • ZooKeeper
Compatible runtime version
7.1.0, 7.2.0, 7.2.1, 7.2.2, 7.2.6, 7.2.7, 7.2.8