Running Apache Spark ApplicationsPDF version

Dynamic allocation

Dynamic allocation allows Spark to dynamically scale the cluster resources allocated to your application based on the workload. When dynamic allocation is enabled and a Spark application has a backlog of pending tasks, it can request executors. When the application becomes idle, its executors are released and can be acquired by other applications.

In Cloudera Data Platform (CDP), dynamic allocation is enabled by default. The table below describes properties to control dynamic allocation.

To disable dynamic allocation, set spark.dynamicAllocation.enabled to false. If you use the --num-executors command-line argument or set the spark.executor.instances property when running a Spark application, the number of initial executors is the greater of spark.executor.instances or spark.dynamicAllocation.initialExecutors.

For more information on how dynamic allocation works, see resource allocation policy.

When Spark dynamic resource allocation is enabled, all resources are allocated to the first submitted job available causing subsequent applications to be queued up. To allow applications to acquire resources in parallel, allocate resources to pools and run the applications in those pools and enable applications running in pools to be preempted.

If you are using Spark Streaming, see the recommendation in Spark Streaming and Dynamic Allocation.

Table 1. Dynamic Allocation Properties
Property Description
spark.dynamicAllocation.executorIdleTimeout The length of time executor must be idle before it is removed.

Default: 60 s.

spark.dynamicAllocation.enabled Whether dynamic allocation is enabled.

Default: true.

spark.dynamicAllocation.initialExecutors The initial number of executors for a Spark application when dynamic allocation is enabled. If spark.executor.instances (or its equivalent command-line argument, --num-executors) is set to a higher number, that number is used instead.

Default: 1.

spark.dynamicAllocation.minExecutors The lower bound for the number of executors.

Default: 0.

spark.dynamicAllocation.maxExecutors The upper bound for the number of executors.

Default: Integer.MAX_VALUE.

spark.dynamicAllocation.schedulerBacklogTimeout The length of time pending tasks must be backlogged before new executors are requested.

Default: 1 s.