Configuring Cluster Dynamic Resource Allocation Manually
To configure a cluster to run Spark jobs with dynamic resource allocation, complete the following steps:
Add the following properties to the
spark-defaults.conf
file associated with your Spark installation (typically in the$SPARK_HOME/conf
directory):Set
spark.dynamicAllocation.enabled
totrue
.Set
spark.shuffle.service.enabled
totrue
.
(Optional) To specify a starting point and range for the number of executors, use the following properties:
spark.dynamicAllocation.initialExecutors
spark.dynamicAllocation.minExecutors
spark.dynamicAllocation.maxExecutors
Note that
initialExecutors
must be greater than or equal tominExecutors
, and less than or equal tomaxExecutors
.For a description of each property, see Dynamic Resource Allocation Properties.
Start the shuffle service on each worker node in the cluster:
In the
yarn-site.xml
file on each node, addspark_shuffle
toyarn.nodemanager.aux-services
, and then setyarn.nodemanager.aux-services.spark_shuffle.class
toorg.apache.spark.network.yarn.YarnShuffleService
.Review and, if necessary, edit
spark.shuffle.service.*
configuration settings.For more information, see the Apache Spark Shuffle Behavior documentation.
Restart all NodeManagers in your cluster.