Configuring and Upgrading Apache Spark
Instructions in this section are specific to HDP-2.3.4 and later. For earlier versions for HDP, refer to the version-specific documentation.
Add the node where you want Apache Spark 1.5.2 History Server to run. Install the version corresponding to the HDP version you currently have installed.
su - root
yum install spark_spark_2_3_4_0_3371-master -y
To use Python:
yum install spark_spark_2_3_4_0_3371-python
conf-select create-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0
cp /etc/spark/spark_2_3_4_0_3371/0/* /etc/spark/spark_2_3_4_0_3371/0/
conf-select set-conf-dir --package spark --stack-version spark_2_3_4_0_3371 --conf-version 0
hdp-select set spark-client spark_2_3_4_0_3371
hdp-select set spark-historyserver spark_2_3_4_0_3371
In HDP 2.4.0, the Spark History Server runs on top of HDFS, not YARN ATS, as in previous versions. Modify Spark configuration files as follows:
As the hdfs service user, create an HDFS directory called spark-history with user:spark, user group:hadoop, and permissions = 777:
hdfs dfs -mkdir /spark-history hdfs dfs -chown -R spark:hadoop /spark-history hdfs dfs -chmod -R 777 /spark-history
Edit the
spark-defaults.conf
file.Add the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
Delete the
spark.yarn.services
property.
Edit the
spark-thrift-sparkconf.conf
fileAdd the following properties and values:
spark.eventLog.dir to hdfs:///spark-history spark.eventLog.enabled to true spark.history.fs.logDirectory to hdfs:///spark-history
Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:
yarn-cluster mode:
./usr/hdp/current/spark-client/bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] [app options]
yarn-client mode:
./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client
Validate the Spark installation. As user spark, run SparkPI example:
sudo su spark
cd /usr/hdp/current/spark-client
./bin/run-example SparkPi 10