spark-submit
command options
You specify spark-submit
options using the form
--option
value
instead of
--option=value
. (Use a space instead of an equals sign.)
Option | Description |
---|---|
class
|
For Java and Scala applications, the fully qualified
classname of the class containing the main method of the
application. For example,
org.apache.spark.examples.SparkPi . |
conf
|
Spark configuration property in key=value format. For values that contain spaces, surround "key=value" with quotes (as shown). |
deploy-mode
|
Deployment mode: cluster and
client. In cluster mode, the driver runs on worker
hosts. In client mode, the driver runs locally as an external
client. Use cluster mode with production jobs; client mode is more
appropriate for interactive and debugging uses, where you want to
see your application output immediately. To see the effect of the
deployment mode when running on YARN, see Deployment Modes.
Default: |
driver-class-path
|
Configuration and classpath entries to pass to the driver. JARs added with --jars are automatically included in the classpath. |
driver-cores
|
Number of cores used by the driver in cluster mode. Default: 1. |
driver-memory
|
Maximum heap size (represented as a JVM string; for example
1024m, 2g, and so on) to allocate to the driver. Alternatively,
you can use the spark.driver.memory property.
|
files
|
Comma-separated list of files to be placed in the working directory of each executor. For the client deployment mode, the path must point to a local file. For the cluster deployment mode, the path can be either a local file or a URL globally visible inside your cluster; see Advanced Dependency Management. |
jars
|
Additional JARs to be loaded in the classpath of drivers and executors in cluster mode or in the executor classpath in client mode. For the client deployment mode, the path must point to a local file. For the cluster deployment mode, the path can be either a local file or a URL globally visible inside your cluster; see Advanced Dependency Management. |
master
|
The location to run the application. |
packages
|
Comma-separated list of Maven coordinates of JARs to include
on the driver and executor classpaths. The local Maven, Maven
central, and remote repositories specified in
repositories are searched in that order. The
format for the coordinates is
groupId:artifactId:version . |
py-files
|
Comma-separated list of .zip, .egg, or .py files to place on
PYTHONPATH . For the client
deployment mode, the path must point to a local file. For the
cluster deployment mode, the path can be either a local
file or a URL globally visible inside your cluster; see Advanced Dependency
Management.
|
repositories
|
Comma-separated list of remote repositories to search for
the Maven coordinates specified in packages .
|
Master | Description |
---|---|
local | Run Spark locally with one worker thread (that is, no parallelism). |
local[K] | Run Spark locally with K worker threads. (Ideally, set this to the number of cores on your host.) |
local[*] | Run Spark locally with as many worker threads as logical cores on your host. |
yarn | Run using a YARN cluster manager. The cluster location is determined by HADOOP_CONF_DIR or YARN_CONF_DIR. See Configuring the Environment. |