Cloudera Data Engineering service
Cloudera Data Engineering (CDE) is a service for CDP Private Cloud Data Services that allows you to submit jobs to auto-scaling virtual clusters.
Cloudera Data Engineering allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters. With Cloudera Data Engineering, you define virtual clusters with a range of CPU and memory resources, and the virtual cluster scales up and down as needed to run your Spark workloads.
The CDE service involves several components:
- A logical subset of your private cloud deployment, including a datalake and multiple compute resources. For more information, see Environments.
- CDE Service
- A logical subset of the long-running Kubernetes cluster and services that manage the virtual clusters. The CDE service must be enabled on an environment before you can create any virtual clusters.
- Virtual Cluster
- An individual auto-scaling cluster with defined CPU and memory ranges. Virtual Clusters in CDE can be created and deleted on demand. Jobs are associated with clusters.
- Application code along with defined configurations and resources. Jobs can be run on demand or scheduled. An individual job execution is called a job run.
- A defined collection of files such as a Python file or application JAR, dependencies, and any other reference files required for a job.
- Job run
- An individual job run.