Configuring and Upgrading Apache Spark
Before you can upgrade Apache Spark, you must have first upgraded your HDP components to the latest version (in this case, 2.4.0). This section assumes that you have already upgraded your components for HDP 2.4.0. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.2 Components for instructions on how to upgrade your HDP components to 2.4.0.
Instructions in this section are specific to HDP-2.4.0 and later. For earlier versions for HDP, refer to the version-specific documentation.
Add the node where you want Spark 1.6 History Server to run. Install the version corresponding to the HDP version you currently have installed.
su - root
yum install spark_2_4_0_0_$BUILD-master -y
To use Python:
yum install spark_2_4_0_0_$BUILD-python
conf-select create-conf-dir --package spark --stack-version spark_2_4_0_0_$BUILD --conf-version 0
cp /etc/spark/spark_2_4_0_0_$BUILD/0/* /etc/spark/spark_2_4_0_0_$BUILD/0/
conf-select set-conf-dir --package spark --stack-version spark_2_4_0_0_$BUILD --conf-version 0
hdp-select set spark-client spark_2_4_0_0_$BUILD
hdp-select set spark-historyserver spark_2_4_0_0_$BUILD
In HDP 2.4.0, the Spark History Server runs on top of HDFS, not YARN ATS, as in previous versions. Modify Spark configuration files as follows:
As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and:
permissions = 777 hdfs dfs -mkdir /spark-history hdfs dfs -chown -R spark:hadoop /spark-history hdfs dfs -chmod -R 777 /spark-history
Edit the
spark-defaults.conf
file.Add the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
Delete the
spark.yarn.services
property.
Edit the
spark-thrift-sparkconf.conf
fileAdd the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:
yarn-cluster mode:
./usr/hdp/current/spark-client/bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] [app options]
yarn-client mode:
./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client
Validate the Spark installation. As user spark, run SparkPI example:
sudo su spark
cd /usr/hdp/current/spark-client
./bin/run-example SparkPi 10
For additional configuration information, see the Spark Guide.