Configuring and Upgrading Apache Spark
Instructions in this section are specific to HDP-2.3.4 and later. For earlier versions for HDP, refer to the version-specific documentation.
Stop your current version of Apache Spark:
su - spark -c "/usr/hdp/current/spark-client/sbin/ stop-history-server.sh"
.Remove your current version of Spark:
yum erase "spark*"
.When you install Spark, two directories are created:
/usr/hdp/current/spark-client
for submitting Spark jobs/usr/hdp/current/spark-history
for launching Spark master processes, such as the Spark history server
Search for Spark in the HDP repo:
For RHEL or CentOS:
yum search spark
For SLES:
zypper install spark
For Ubuntu and Debian:
apt-cache spark
This shows all the versions of Spark available. For example,
spark_2_3_4_0_3371-master.noarch : Server for Spark master spark_2_3_4_0_3371-python.noarch : Python client for Spark spark_2_3_4_0_3371-worker.noarch : Server for Spark worker
Add the node where you want Spark 1.5.2 History Server to run. Install the version corresponding to the HDP version you currently have installed.
su - root
yum install spark_spark_2_3_4_0_2950-master -y
To use Python:
yum install spark_spark_2_3_4_0_2950-python
conf-select create-conf-dir --package spark --stack-version spark_2_3_4_0_2950 --conf-version 0
cp /etc/spark/spark_2_3_4_0_2950/0/* /etc/spark/spark_2_3_4_0_2950/0/
conf-select set-conf-dir --package spark --stack-version spark_2_3_4_0_2950 --conf-version 0
hdp-select set spark-client spark_2_3_4_0_2950
hdp-select set spark-historyserver spark_2_3_4_0_2950
Validate the Spark installation. As user spark, run SparkPI example:
su - spark -c "cd /usr/hdp/current/spark-client"
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
Restart Spark on YARN in either yarn-cluster mode or yarn-client mode:
yarn-cluster mode:
./usr/hdp/current/spark-client/bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] [app options]
yarn-client mode:
./usr/hdp/current/spark-client/bin/spark-shell --master yarn-client