Spark prerequisites
Before installing Spark, ensure that your cluster meets the following prerequisites.
-
HDP cluster stack version 3.0 or later
-
(Optional) Ambari version 2.7.0 or later
-
HDFS and YARN deployed on the cluster
Only Spark version 2 is supported.
Additionally, note the following requirements and recommendations for optional Spark services and features:
-
Spark Thrift server requires Hive deployed on the cluster.
-
SparkR requires R binaries installed on all nodes.
-
Spark access through Livy requires the Livy server installed on the cluster.
-
PySpark and associated libraries require Python version 2.7 or later, or Python version 3.4 or later, installed on all nodes.
-
For optimal performance with MLlib, consider installing the
netlib-java
library.