Specifying Which Version of Spark to Run
More than one version of Spark can run on a node. If your cluster runs Spark 1, you can install Spark 2 and test jobs on Spark 2 in parallel with a Spark 1 working environment. After verifying that all scripts and jobs run successfully with Spark 2 (including any changes for backward compatibility), you can then step through transitioning jobs from Spark 1 to Spark 2. For more information about installing a second version of Spark, see Installing Spark.
Use the following guidelines for determining which version of Spark runs a job by default, and for specifying an alternate version if desired.
By default, if only one version of Spark is installed on a node, your job runs with the installed version.
By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package. In HDP 2.6, the default is Spark version 1.6.
If you want to run jobs on the non-default version of Spark, use one of the following approaches:
If you use full paths in your scripts, change
spark-client
tospark2-client
; for example:change
/usr/hdp/current/spark-client/bin/spark-submit
to
/usr/hdp/current/spark2-client/bin/spark-submit
.If you do not use full paths, but instead launch jobs from the path, set the
SPARK_MAJOR_VERSION
environment variable to the desired version of Spark before you launch the job.For example, if Spark 1.6.3 and Spark 2.0 are both installed on a node and you want to run your job with Spark 2.0, set
SPARK_MAJOR_VERSION=2
.You can set
SPARK_MAJOR_VERSION
in automation scripts that use Spark, or in your manual settings after logging on to the shell.Note: The
SPARK_MAJOR_VERSION
environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session.
The following example submits a SparkPi job to Spark 2, using spark-submit
under /usr/bin
:
Navigate to a host where Spark 2.0 is installed.
Change to the
Spark2
client directory:cd /usr/hdp/current/spark2-client/
Set the
SPARK_MAJOR_VERSION
environment variable to 2:export SPARK_MAJOR_VERSION=2
Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --num-executors 1 \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1 \ examples/jars/spark-examples*.jar 10
Note that the path to
spark-examples-*.jar
is different than the path used for Spark 1.x.
To change the environment variable setting later, either remove the environment variable or change the setting to the newly desired version.