Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Specifying Which Version of Spark to Use

You can install more than one version of Spark on a node. Here are the guidelines for determining which version runs your job:

  • By default, if only one version of Spark is installed on a node, your job runs with the installed version.

  • By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.

    The default version for HDP 2.5.0 is Spark 1.6.2.

  • If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.

    To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

    For example, if Spark 1.6.2 and the Spark 2.0 technical preview are both installed on a node, and you want to run your job with Spark 2.0, set SPARK_MAJOR_VERSION to 2.0.

The SPARK_MAJOR_VERSION environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

  1. Navigate to a host where Spark 2.0 is installed.

  2. Change to the Spark2 client directory:

    cd /usr/hdp/current/spark2-client/

  3. Set the SPARK_MAJOR_VERSION environment variable to 2:

    export SPARK_MAJOR_VERSION=2

  4. Run the Spark Pi example:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

    Note that the path to spark-examples-*.jar is different than the path used for Spark 1.x.

To change the environment variable setting later, either remove the environment variable or change the setting to the newly desired version.