Command Line Upgrade
Also available as:
loading table of contents...

Configuring and Upgrading Apache Spark

Instructions in this section are specific to HDP-2.3.4 and later. For earlier versions for HDP, refer to the version-specific documentation.

  1. Stop your current version of Apache Spark:

    su - spark -c "/usr/hdp/current/spark-client/sbin/".

  2. Remove your current version of Spark: yum erase "spark*".

  3. When you install Spark, two directories are created:

    • /usr/hdp/current/spark-client for submitting Spark jobs

    • /usr/hdp/current/spark-history for launching Spark master processes, such as the Spark history server

    Search for Spark in the HDP repo:

    • For RHEL or CentOS:

      yum search spark

    • For SLES:

      zypper install spark

    • For Ubuntu and Debian:

      apt-cache spark

    This shows all the versions of Spark available. For example,

    spark_2_3_4_0_3371-master.noarch : Server for Spark master
    spark_2_3_4_0_3371-python.noarch : Python client for Spark
    spark_2_3_4_0_3371-worker.noarch : Server for Spark worker
  4. Add the node where you want Spark 1.5.2 History Server to run. Install the version corresponding to the HDP version you currently have installed.

    1. su - root

    2. yum install spark_spark_2_3_4_0_2950-master -y

    3. To use Python: yum install spark_spark_2_3_4_0_2950-python

    4. conf-select create-conf-dir --package spark --stack-version spark_2_3_4_0_2950 --conf-version 0

    5. cp /etc/spark/spark_2_3_4_0_2950/0/* /etc/spark/spark_2_3_4_0_2950/0/

    6. conf-select set-conf-dir --package spark --stack-version spark_2_3_4_0_2950 --conf-version 0

    7. hdp-select set spark-client spark_2_3_4_0_2950

    8. hdp-select set spark-historyserver spark_2_3_4_0_2950

  5. Validate the Spark installation. As user spark, run SparkPI example:

    1. su - spark -c "cd /usr/hdp/current/spark-client"

    2. ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

  6. Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:

    • yarn-cluster mode: ./usr/hdp/current/spark-client/bin/spark-submit --class --master yarn-cluster [options] [app options]

    • yarn-client mode: ./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client