Upgrading HDP Manually
Also available as:
PDF
loading table of contents...

Configuring and Upgrading Apache Spark

[Note]Note

Instructions in this section are specific to HDP-2.3.4 and later. For earlier versions of HDP, refer to the documentation that corresponds to your version.

For information about Spark version support, see the Spark - HDP Version Support table in the Spark Guide Introduction.

  1. Install Spark 1.5.2 (and, optionally, Python) on the node where you want the Spark History Server to run:

    1. su - root

    2. yum install spark_spark_2_3_4_0_3371-master -y

    3. To use Python: yum install spark_spark_2_3_4_0_3371-python

    4. conf-select create-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0

    5. cp /etc/spark/spark_2_3_4_0_3371/0/* /etc/spark/spark_2_3_4_0_3371/0/

    6. conf-select set-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0

    7. hdp-select set spark-client spark_2_3_4_0_3371

    8. hdp-select set spark-historyserver spark_2_3_4_0_3371

  2. Stop the Spark history-server. If you are using the Spark thrift-server, stop the thrift-server.

    su - spark -c "$SPARK_HOME/sbin/stop-history-server.sh"
    su - spark -c "$SPARK_HOME/sbin/stop-thriftserver.sh"
  3. It is recommended that you run the Spark History Server on top of HDFS, not YARN ATS. Modify the Spark configuration files as follows:

    1. As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and permissions = 777:

      hdfs dfs -mkdir /spark-history
      hdfs dfs -chown -R spark:hadoop /spark-history
      hdfs dfs -chmod -R 777 /spark-history
    2. Edit the spark-defaults.conf file.

      Add the following properties and values:

      spark.eventLog.dir to hdfs:///spark-history
      spark.eventLog.enabled to true
      spark.history.fs.logDirectory to hdfs:///spark-history
    3. Edit the spark-thrift-sparkconf.conf file.

      Add the following properties and values:

      spark.eventLog.dir to hdfs:///spark-history
      spark.eventLog.enabled to true
      spark.history.fs.logDirectory to hdfs:///spark-history
  4. Restart the history-server:

    su - spark -c "/usr/hdp/current/spark-historyserver/sbin/start-history-server.sh"
  5. If you are using the Spark thrift-server, restart the thrift-server. See (Optional) Starting the Spark Thrift Server.

  6. Validate the Spark installation. As user spark, run the Spark Pi example in the Spark Guide.