Chapter 2. Installing Spark
Before installing Spark, ensure that your cluster meets the following prerequisites:
HDP cluster stack version 2.5.0 or later
(Optional) Ambari version 2.4.0 or later
HDFS and YARN deployed on the cluster
Additionally, note the following requirements and recommendations for optional features:
Spark Thrift server requires Hive deployed on the cluster.
PySpark requires Python installed on all nodes.
SparkR requires R binaries installed on all nodes.
SparkR is not currently supported on SLES.
Spark access from Zeppelin using Livy requires the Livy server installed on the cluster (described in Installing Spark Using Ambari).
For optimal performance with MLlib, consider installing the
netlib-java
library.
Although you can install Spark on a cluster not managed by Ambari (see Installing and Configuring Apache Spark in the Non-Ambari Cluster Installation Guide), this chapter describes how to install Spark on an Ambari-managed cluster.