Apache Spark Component Guide
Also available as:
loading table of contents...

Configuring Cluster Dynamic Resource Allocation Manually

To configure a cluster to run Spark jobs with dynamic resource allocation, complete the following steps:

  1. Add the following properties to the spark-defaults.conf file associated with your Spark installation (typically in the $SPARK_HOME/conf directory):

    • Set spark.dynamicAllocation.enabled to true.

    • Set spark.shuffle.service.enabled to true.

  2. (Optional) To specify a starting point and range for the number of executors, use the following properties:

    • spark.dynamicAllocation.initialExecutors

    • spark.dynamicAllocation.minExecutors

    • spark.dynamicAllocation.maxExecutors

    Note that initialExecutors must be greater than or equal to minExecutors, and less than or equal to maxExecutors.

    For a description of each property, see Dynamic Resource Allocation Properties.

  3. Start the shuffle service on each worker node in the cluster:

    1. In the yarn-site.xml file on each node, add spark_shuffle to yarn.nodemanager.aux-services, and then set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService.

    2. Review and, if necessary, edit spark.shuffle.service.* configuration settings.

      For more information, see the Apache Spark Shuffle Behavior documentation.

    3. Restart all NodeManagers in your cluster.