Spark Guide
Also available as:
PDF
loading table of contents...

Chapter 2. Prerequisites

Before installing Spark, make sure your cluster meets the following prerequisites.

Table 2.1. Prerequisites for Running Spark 1.5.2

PrerequisiteDescription
HDP Cluster Stack Version
  • 2.3.4 or later

(Optional) Ambari Version
  • 2.2 or later

Software dependencies
  • Spark requires HDFS and YARN

  • PySpark requires Python to be installed on all nodes

  • (Optional) The Spark Thrift Server requires Hive to be deployed on your cluster

  • (Optional) For optimal performance with MLlib, consider installing the netlib-java library

  • SparkR (tech preview) requires R binaries to be installed on all nodes


[Note]Note

When you upgrade your cluster to HDP 2.3.4, Spark is automatically upgraded to 1.5.2. If you wish to use a previous version of Spark, follow the Spark Manual Downgrade procedure in the Release Notes.