Running a Spark job using the CLI
The following example demonstrates how to run a Cloudera Data Engineering (CDE) Spark job using the command line interface (CLI).
Make sure that the Spark job has been created and all necessary resources have been created and uploaded.
Using the cde job run
requires more preparation on the target
environment compared to the cde spark submit
command. Whereas
cde spark submit
is a quick and efficient way of testing a
Spark job during development, cde job run
is suited for production
environments where a job is to be run multiple times, therefore removing resources
and job definitions after every job run is neither necessary, nor viable.
cde job run --name <job name> [Spark flags...] [--wait] [--variable name=value...]
- With
[Spark flags...]
you can override the corresponding job values. Spark flags that can be repeated replace the original list, except for--conf
which only adds or replaces values for the given keys. - With
[--variable]
flags you can replace strings in job values. Currently the supported fields are:- Spark application name
- Spark arguments
- Spark configurations
name=value
any substring{{{name}}}
in the value of the supported field gets replaced withvalue
. - A custom runtime Docker image can be specified for the job using the
--runtime-image-resource-name
flag, which has to refer to the name of a custom image resource that has already been created.
By default the command returns the job run ID as soon as the job has been submitted.
Optionally, you can use the --wait
switch to wait until the
job run ends and returns a non-zero exit code if the job run was not
successful.