Configuring Spark on YARN Applications
In addition to spark-submit Options, options for running Spark applications on YARN are listed in spark-submit on YARN Options.
Option | Description |
---|---|
archives
|
Comma-separated list of archives to be extracted into the working directory of each executor. For the client deployment mode, the path must point to a local file. For the cluster deployment mode, the path can be either a local file or a URL globally visible inside your cluster; see Advanced Dependency Management. |
executor-cores
|
Number of processor cores to allocate on each executor.
Alternatively, you can use the
spark.executor.cores property.
|
executor-memory
|
Maximum heap size to allocate to each executor. Alternatively,
you can use the spark.executor.memory property.
|
num-executors
|
Total number of YARN containers to allocate for this
application. Alternatively, you can use the
spark.executor.instances property. If dynamic
allocation is enabled, the initial number of executors is the
greater of this value or the
spark.dynamicAllocation.initialExecutors value. |
queue
|
YARN queue to submit to. For more information, see
Assigning Applications and Queries to Resource Pools. Default:
|
During initial installation, Cloudera Manager tunes properties according to your cluster environment.
In addition to the command-line options, the following properties are available:
Property | Description |
---|---|
spark.yarn.driver.memoryOverhead
|
Amount of extra off-heap memory that can be requested from YARN
per driver. Combined with spark.driver.memory ,
this is the total memory that YARN can use to create a JVM for a
driver process.
|
spark.yarn.executor.memoryOverhead
|
Amount of extra off-heap memory that can be requested from YARN,
per executor process. Combined with
spark.executor.memory , this is the total memory
YARN can use to create a JVM for an executor process.
|