Cloudera Data Engineering service

Cloudera Data Engineering is a serverless service for Cloudera that allows you to submit batch jobs to auto-scaling virtual clusters. Cloudera Data Engineering enables you to spend more time on your applications, and less time on infrastructure.

Cloudera Data Engineering allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters. With Cloudera Data Engineering, you define virtual clusters with a range of CPU and memory resources, and the cluster scales up and down as needed to run your Spark workloads, helping to control your cloud costs.

The Cloudera Data Engineering service involves several components:

Environment: A logical subset of your cloud provider account including a specific virtual network. For more information, see Environments.
Cloudera Data Engineering Service: The long-running Kubernetes cluster and services that manage the virtual clusters. The Cloudera Data Engineering service must be enabled on an environment before you can create any virtual clusters.
Virtual Cluster: An individual auto-scaling cluster with defined CPU and memory ranges. Virtual Clusters in Cloudera Data Engineering can be created and deleted on demand. Jobs are associated with clusters.
Jobs: Application code along with defined configurations and resources. Jobs can be run on demand or scheduled. An individual job execution is called a job run.
Resource: A defined collection of files such as a Python file or application JAR, dependencies, and any other reference files required for a job.
Job run: An individual job run.

The Cloudera Data Engineering service differs from a Cloudera Data Engineering Data Hub cluster in several ways, including the following:

Table 1. Cloudera Data Engineering Service vs. Cloudera Data Engineering Data Hub
Feature	Cloudera Data Engineering	Data Hub DE Template
Cloud providers	Amazon AWS, Microsoft Azure	Amazon AWS, Microsoft Azure
Compute engines	Apache Spark	Apache Spark, Apache Hive
Deployment	Kubernetes	Virtual machines (cloud provider)
Resource management	Yunikorn, Kubernetes	YARN
Troubleshooting	Cloudera Data Engineering deep analysis, Spark History Server	Spark History Server
Portability	on cloud/on premises	on cloud
Job submission	Managed API	Apache Livy

Browser Requirements🔗

Supported browsers:

Chrome
Safari

Unsupported browsers:

Firefox

Cloudera Data Engineering service

Browser Requirements🔗

We want your opinion

How can we improve this page?