Configuring Spark on YARN Applications

In addition to spark-submit Options, options for running Spark applications on YARN are listed in spark-submit on YARN Options.

Table 1. spark-submit on YARN Options
Option	Description
`archives`	Comma-separated list of archives to be extracted into the working directory of each executor. For the client deployment mode, the path must point to a local file. For the cluster deployment mode, the path can be either a local file or a URL globally visible inside your cluster; see Advanced Dependency Management.
`executor-cores`	Number of processor cores to allocate on each executor. Alternatively, you can use the `spark.executor.cores` property.
`executor-memory`	Maximum heap size to allocate to each executor. Alternatively, you can use the `spark.executor.memory` property.
`num-executors`	Total number of YARN containers to allocate for this application. Alternatively, you can use the `spark.executor.instances` property. If dynamic allocation is enabled, the initial number of executors is the greater of this value or the `spark.dynamicAllocation.initialExecutors` value.
`queue`	YARN queue to submit to. For more information, see Assigning Applications and Queries to Resource Pools. Default: `default`.

During initial installation, Cloudera Manager tunes properties according to your cluster environment.

In addition to the command-line options, the following properties are available:


Property	Description
`spark.yarn.driver.memoryOverhead`	Amount of extra off-heap memory that can be requested from YARN per driver. Combined with `spark.driver.memory`, this is the total memory that YARN can use to create a JVM for a driver process.
`spark.yarn.executor.memoryOverhead`	Amount of extra off-heap memory that can be requested from YARN, per executor process. Combined with `spark.executor.memory`, this is the total memory YARN can use to create a JVM for an executor process.