Cloudera Data Engineering auto-scaling

Cloudera Data Engineering (CDE) auto-scales at the job level as well as the service and virtual cluster level. Service and virtual cluster autoscaling uses Apache Yunikorn (Incubating) for resource scheduling to improve efficiency and utilization, reducing cloud provider costs. Job auto-scaling is managed by Apache Spark dynamic allocation. CDE scales resources up and down as needed for running jobs.

Service and virtual cluster auto-scaling

Service and virtual cluster autoscaling uses Apache Yunikorn (Incubating) for resource scheduling to improve efficiency and utilization

When you create a Cloudera Data Engineering (CDE) service, you specify an instance type (size) and auto-scale range of instances. Virtual clusters associated with the CDE service use CPU and memory resources as needed to run jobs. When more resources are required, virtual machines of the specified instance type are started. When resources are no longer required, instances are terminated.

Virtual clusters also have auto-scaling controls, specified as maximum CPU cores and memory (in gigabytes).

CDE takes advantage of Yunikorn resource scheduling and sorting policies, such as gang scheduling and bin packing, to optimize resource utilization and improve cost efficiency.

For more information on gang scheduling, see the Cloudera blog post Spark on Kubernetes – Gang Scheduling with YuniKorn.

Job auto-scaling

Job auto-scaling is controlled by Apache Spark dynamic allocation.

Dynamic allocation scales job executors up and down as needed for running jobs. This can provide large performance benefits by allocating as many resources as needed by the running job, and by returning resources when they are not needed so that concurrent jobs can potentially run faster.

Resources are limited by the job configuration (executor range) as well as the virtual cluster auto-scaling parameters. By default, the executor range is set to match the range of CPU cores configured for the virtual cluster. This improves resource utilization and efficiency by allowing jobs to scale up to the maximum virtual cluster resources available, without manually tuning and optimizing the number of executors per job.