Command Line Upgrade
Also available as:
PDF
loading table of contents...

Configuring and Upgrading Apache Spark

Instructions in this section are specific to HDP-2.3.4 and later. For earlier versions for HDP, refer to the version-specific documentation.

  1. Add the node where you want Apache Spark 1.5.2 History Server to run. Install the version corresponding to the HDP version you currently have installed.

    1. su - root

    2. yum install spark_spark_2_3_4_0_3371-master -y

    3. To use Python: yum install spark_spark_2_3_4_0_3371-python

    4. conf-select create-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0

    5. cp /etc/spark/spark_2_3_4_0_3371/0/* /etc/spark/spark_2_3_4_0_3371/0/

    6. conf-select set-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0

    7. hdp-select set spark-client spark_2_3_4_0_3371

    8. hdp-select set spark-historyserver spark_2_3_4_0_3371

  2. In HDP 2.4.0, the Spark History Server runs on top of HDFS, not YARN ATS, as in previous versions. Modify Spark configuration files as follows:

    1. As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and permissions = 777:

      hdfs dfs -mkdir /spark-history
      hdfs dfs -chown -R spark:hadoop /spark-history
      hdfs dfs -chmod -R 777 /spark-history
    2. Edit the spark-defaults.conf file.

      • Add the following properties and values:

        spark.eventLog.dir to hdfs:///spark-history
        spark.eventLog.enabled to true
        spark.history.fs.logDirectory to hdfs:///spark-history
      • Delete the spark.yarn.services property.

    3. Edit the spark-thrift-sparkconf.conf file

      • Add the following properties and values:

        spark.eventLog.dir to hdfs:///spark-history
        spark.eventLog.enabled to true
        spark.history.fs.logDirectory to hdfs:///spark-history
  3. Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:

    • yarn-cluster mode: ./usr/hdp/current/spark-client/bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] [app options]

    • yarn-client mode: ./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client

  4. Validate the Spark installation. As user spark, run SparkPI example:

    1. sudo su spark

    2. cd /usr/hdp/current/spark-client

    3. ./bin/run-example SparkPi 10