Configuring Apache Spark
Also available as:

Manually configure dynamic resource allocation

Use the following steps to manually configure dynamic resource allocation settings.

  1. Add the following properties to the spark-defaults.conf file associated with your Spark installation (typically in the $SPARK_HOME/conf directory):
    • Set spark.dynamicAllocation.enabled to true.

    • Set spark.shuffle.service.enabled to true.

  2. (Optional) To specify a starting point and range for the number of executors, use the following properties:
    • spark.dynamicAllocation.initialExecutors

    • spark.dynamicAllocation.minExecutors

    • spark.dynamicAllocation.maxExecutors

    Note that initialExecutors must be greater than or equal to minExecutors, and less than or equal to maxExecutors.

    For a description of each property, see "Dynamic Resource Allocation Properties" in this guide.
  3. Start the shuffle service on each worker node in the cluster:
    1. In the yarn-site.xml file on each node, add spark_shuffle to yarn.nodemanager.aux-services, and then set yarn.nodemanager.aux-services.spark_shuffle.class
    2. Review and, if necessary, edit spark.shuffle.service.* configuration settings.
    3. Restart all NodeManagers in your cluster.