Cloudera Data Engineering auto-scaling

Cloudera Data Engineering (CDE) auto-scales at the job level as well as the service and virtual cluster level. Service and virtual cluster autoscaling uses Apache Yunikorn (Incubating) for resource scheduling to improve efficiency and utilization, reducing cloud provider costs. Job auto-scaling is managed by Apache Spark dynamic allocation. CDE scales resources up and down as needed for running jobs.

Service and virtual cluster auto-scaling

Service and virtual cluster autoscaling uses Apache Yunikorn (Incubating) for resource scheduling to improve efficiency and utilization

When you create a Cloudera Data Engineering (CDE) service, you specify an instance type (size) and auto-scale range of instances. Virtual clusters associated with the CDE service use CPU and memory resources as needed to run jobs. When more resources are required, virtual machines of the specified instance type are started. When resources are no longer required, instances are terminated.

Virtual clusters also have auto-scaling controls, specified as maximum CPU cores and memory (in gigabytes).

CDE takes advantage of YuniKorn resource scheduling and sorting policies, such as gang scheduling and bin packing, to optimize resource utilization and improve cost efficiency.

By default, CDE uses the bin packing policy to allocate resources. It sorts the list of nodes ascending by the amount of available resources and the node with the lowest amount of available resources is selected. Sorting is solely based on an array of available memory in MBs and available CPUs in milli CPUs (mCPU), considering absolute numbers only. It disregards resource types and percentage of total resources available. For example, if you have three nodes with the following amounts of available resources:

Node Available memory in MB Available CPU in mCPU
nodeA 100,000 4,000
nodeB 4,000 10,000
nodeC 10,000 10,000

YuniKorn makes the following sorting:

nodeB 4,000 10,000
nodeA 4,000 100,000
nodeC 10,000 10,000

If bin packing does not provide optimal results for the applications you run, consider selecting one of the other available sorting policies.

For more information on gang scheduling, see the Cloudera blog post Spark on Kubernetes – Gang Scheduling with YuniKorn.

Job auto-scaling

Job auto-scaling is controlled by Apache Spark dynamic allocation.

Dynamic allocation scales job executors up and down as needed for running jobs. This can provide large performance benefits by allocating as many resources as needed by the running job, and by returning resources when they are not needed so that concurrent jobs can potentially run faster.

Resources are limited by the job configuration (executor range) as well as the virtual cluster auto-scaling parameters. By default, the executor range is set to match the range of CPU cores configured for the virtual cluster. This improves resource utilization and efficiency by allowing jobs to scale up to the maximum virtual cluster resources available, without manually tuning and optimizing the number of executors per job.