Specifying Which Version of Spark to Use
You can install more than one version of Spark on a node. Here are the guidelines for determining which version runs your job:
By default, if only one version of Spark is installed on a node, your job runs with the installed version.
By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.
The default version for HDP 2.5.5 is Spark 1.6.2.
If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.
To do this, set the
SPARK_MAJOR_VERSION
environment variable to the desired version before you launch the job.For example, if Spark 1.6.2 and the Spark 2.0 technical preview are both installed on a node, and you want to run your job with Spark 2.0, set
SPARK_MAJOR_VERSION
to2.0
.
The SPARK_MAJOR_VERSION
environment variable can be set by any
user who logs on to a client machine to run Spark. The scope of the environment variable is
local to the user session.
Here is an example for a user who submits jobs using spark-submit
under
/usr/bin
:
Navigate to a host where Spark 2.0 is installed.
Change to the Spark2 client directory:
cd /usr/hdp/current/spark2-client/
Set the
SPARK_MAJOR_VERSION
environment variable to 2:export SPARK_MAJOR_VERSION=2
Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10
Note that the path to
spark-examples-*.jar
is different than the path used for Spark 1.x.
To change the environment variable setting later, either remove the environment variable or change the setting to the newly desired version.