Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Chapter 2. Installing Spark

Before installing Spark, ensure that your cluster meets the following prerequisites:

  • HDP cluster stack version 2.5.0 or later

  • (Optional) Ambari version 2.4.0 or later

  • HDFS and YARN deployed on the cluster

Additionally, note the following requirements and recommendations for optional features:

  • Spark Thrift server requires Hive deployed on the cluster.

  • PySpark requires Python installed on all nodes.

  • SparkR requires R binaries installed on all nodes.

  • SparkR is not currently supported on SLES.

  • Spark access from Zeppelin using Livy requires the Livy server installed on the cluster (described in Installing Spark Using Ambari).

  • For optimal performance with MLlib, consider installing the netlib-java library.

Although you can install Spark on a cluster not managed by Ambari (see Installing and Configuring Apache Spark in the Non-Ambari Cluster Installation Guide), this chapter describes how to install Spark on an Ambari-managed cluster.