Installing Apache Spark
Also available as:
PDF

Spark prerequisites

Before installing Spark, ensure that your cluster meets the following prerequisites.

  • HDP cluster stack version 3.0 or later

  • (Optional) Ambari version 2.7.0 or later

  • HDFS and YARN deployed on the cluster

Only Spark version 2 is supported.

Additionally, note the following requirements and recommendations for optional Spark services and features:

  • Spark Thrift server requires Hive deployed on the cluster.

  • SparkR requires R binaries installed on all nodes.

  • Spark access through Livy requires the Livy server installed on the cluster.

  • PySpark and associated libraries require Python version 2.7 or later, or Python version 3.4 or later, installed on all nodes.

  • For optimal performance with MLlib, consider installing the netlib-java library.