Spark Guide
Also available as:
loading table of contents...

Chapter 2. Prerequisites

Before installing Spark, make sure your cluster meets the following prerequisites.

Table 2.1. Prerequisites for Running Spark

HDP Cluster Stack Version
  • 2.3.6 or later

(Optional) Ambari Version
  • 2.2 or later

Software dependencies
  • Spark requires HDFS and YARN

  • PySpark requires Python to be installed on all nodes

  • (Optional) The Spark Thrift Server requires Hive to be deployed on your cluster

  • (Optional) For optimal performance with MLlib, consider installing the netlib-java library

  • SparkR (tech preview) requires R binaries to be installed on all nodes


When you upgrade your cluster to HDP 2.3.6, Spark is automatically upgraded to 1.5.2. If you wish to use a previous version of Spark, follow the Spark Manual Downgrade procedure in the Release Notes.