Configuring Cluster Dynamic Resource Allocation Manually
To configure a cluster to run Spark applications with dynamic resource allocation:
Add the following properties to the
spark-defaults.conf
file associated with your Spark installation. (For general Spark applications, this file typically resides at$SPARK_HOME/conf/spark-defaults.conf
.)Set
spark.dynamicAllocation.enabled
totrue
Set
spark.shuffle.service.enabled
totrue
(Optional) The following properties specify a starting point and range for the number of executors. Note that
initialExecutors
must be greater than or equal tominExecutors
, and less than or equal tomaxExecutors
.spark.dynamicAllocation.initialExecutors
spark.dynamicAllocation.minExecutors
spark.dynamicAllocation.maxExecutors
For a description of each property, see Dynamic Resource Allocation Properties.
Start the shuffle service on each worker node in the cluster. (The shuffle service runs as an auxiliary service of the NodeManager.)
In the
yarn-site.xml
file on each node, addspark_shuffle
toyarn.nodemanager.aux-services
, then setyarn.nodemanager.aux-services.spark_shuffle.class
toorg.apache.spark.network.yarn.YarnShuffleService
.Review and, if necessary, edit
spark.shuffle.service.*
configuration settings. For more information, see the Apache Spark Shuffle Behavior documentation.Restart all NodeManagers in your cluster.