Chapter 2. Installing Spark
Before installing Spark, ensure that your cluster meets the following prerequisites:
HDP cluster stack version 2.6.0 or later
(Optional) Ambari version 2.5.0 or later
HDFS and YARN deployed on the cluster
You can choose to install Spark version 1, Spark version 2, or both. (To specify which version of Spark runs a job, see Specifying Which Version of Spark to Run.)
Additionally, note the following requirements and recommendations for optional Spark services and features:
Spark Thrift server requires Hive deployed on the cluster.
SparkR requires R binaries installed on all nodes.
SparkR is not currently supported on SLES.
Spark access through Livy requires the Livy server installed on the cluster.
For clusters managed by Ambari, see Installing Spark Using Ambari.
For clusters not managed by Ambari, see "Installing and Configuring Livy" in the Spark or Spark 2 chapter of the Command Line Installation Guide, depending on the version of Spark installed on your cluster.
PySpark and associated libraries require Python version 2.7 or later, or Python version 3.4 or later, installed on all nodes.
For optimal performance with MLlib, consider installing the
netlib-java
library.