Data Engineering clusters
Learn about the default Data Engineering clusters, including cluster definition and template names, included services, and compatible Runtime version.
Data Engineering provides a complete data processing solution, powered by Apache Spark and Apache Hive. Spark and Hive enable fast, scalable, fault-tolerant data engineering and analytics over petabytes of data.
Data Engineering cluster definition
This Data Engineering template includes a standalone deployment of Spark and Hive, as well as Apache Oozie for job scheduling and orchestration, Apache Livy for remote job submission, and Hue and Apache Zeppelin for job authoring and interactive analysis.
- Cluster definition names
Data Engineering for AWS
Data Engineering for Azure
- Data Engineering for GCP
Data Engineering HA for AWS (Preview)
Data Engineering HA for Azure (Preview)
- Data Engineering HA for GCP (Preview)
Data Engineering Spark3 for AWS
Data Engineering Spark3 for Azure
- Data Engineering Spark3 for GCP
- Cluster template name
CDP - Data Engineering: Apache Spark, Apache Hive, Apache Oozie
CDP - Data Engineering HA: Apache Spark, Apache Hive, Hue, Apache Oozie
- CDP - Data Engineering: Apache Spark3
- Included services
- Data Analytics Studio (DAS)
- Compatible runtime version
- 7.1.0, 7.2.0, 7.2.1, 7.2.2, 7.2.6, 7.2.7, 7.2.8