Configuring and Upgrading Apache Spark

Before you can upgrade Apache Spark, you must have first upgraded your HDP components to the latest version (in this case, 2.4.0). This section assumes that you have already upgraded your components for HDP 2.4.0. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.3 Components for instructions on how to upgrade your HDP components to 2.4.0.

To upgrade Spark, start the service and update configurations.

Replace the hdp version in $SPARK_HOME/conf/spark-defaults.conf and $SPARK_HOME/conf/java-opts with the current hadoop version.
```
su - spark -c "$SPARK_HOME/sbin/start-history-server.sh"
```
In HDP 2.4.0, the Spark History Server runs on top of HDFS, not YARN ATS, as in previous versions. Modify Spark configuration files as follows:
1. As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and:
```
permissions = 777
hdfs dfs -mkdir /spark-history
hdfs dfs -chown -R spark:hadoop /spark-history
hdfs dfs -chmod -R 777 /spark-history
```
2. Edit the spark-defaults.conf file.
  - Add the following properties and values:
    spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
  - Delete the spark.yarn.services property.
3. Edit the spark-thrift-sparkconf.conf file
  - Add the following properties and values:
    spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history

Restart the history server:

su - spark -c "usr/hdp/current/spark-historyserver/sbin/start-history-server.sh"

If you will be running Spark in yarn-client mode, update the following property in /etc/hadoop/conf/mapred-site.xml by substituting ${hdp.version} with the actual HDP version (2.4.0.0-$BUILD).

<property>
 <name>mapreduce.application.classpath</name> 
 <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*,
   $PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*,
   $PWD/mr-framework/hadoop/share/hadoop/common/*,
   $PWD/mr-framework/hadoop/share/hadoop/common/lib/*,
   $PWD/mr-framework/hadoop/share/hadoop/yarn/*,
   $PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*,
   $PWD/mr-framework/hadoop/share/hadoop/hdfs/*,
   $PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*,
  /usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar,
  /etc/hadoop/conf/secure</value>
</property>

Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:

yarn-cluster mode:

/usr/hdp/current/spark-client/bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app
            options]

yarn-client mode:

./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client

Validate the Spark installation. As user spark, run SparkPI example:
1. sudo su spark
2. cd /usr/hdp/current/spark-client
3. ./bin/run-example SparkPi 10

To enable Spark to work in LzoCodec, add the following information to /etc/spark/conf/spark-defaults.conf, when running on client-mode:

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/naive:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.4.0.0-$BUILD.jar

For additional configuration information, see the Spark Guide.

​Configuring and Upgrading Apache Spark

Configuring and Upgrading Apache Spark