Configuring and Upgrading Apache Spark
Before you can upgrade Apache Spark, you must have first upgraded your HDP components to the latest version (in this case, 2.4.2). This section assumes that you have already upgraded your components for HDP 2.4.2. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.3 Components for instructions on how to upgrade your HDP components to 2.4.2.
To upgrade Spark, start the service and update configurations.
Stop the Spark
history-server
. If you are using the Sparkthrift-server
, stop thethrift-server
.su - spark -c "$SPARK_HOME/sbin/stop-history-server.sh" su - spark -c "$SPARK_HOME/sbin/stop-thriftserver.sh"
Remove any reference to
hdp.version
from the Spark configuration files.In HDP 2.4.2, the Spark History Server runs on top of HDFS, not YARN ATS, as in previous versions. Modify Spark configuration files as follows:
As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and permissions = 777:
hdfs dfs -mkdir /spark-history hdfs dfs -chown -R spark:hadoop /spark-history hdfs dfs -chmod -R 777 /spark-history
Edit the
spark-defaults.conf
file.Add the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
Edit the
spark-thrift-sparkconf.conf
fileAdd the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history spark.hadoop.cacheConf to false
Restart the
history-server
:su - spark -c "/usr/hdp/current/spark-historyserver/sbin/start-history-server.sh"
If you are using the Spark thrift-server, restart the
thrift-server
. See (Optional) Starting the Spark Thrift Server.Validate the Spark installation. As user
spark
, run the Spark Pi example in the Spark Guide.
For additional configuration information, see the Spark Guide.