CDS Powered by Apache Spark Requirements

The following sections describe software requirements for CDS Powered by Apache Spark.

CDH Versions

Supported versions of CDH are described below.

A Hive compatibility issue in CDS 2.0 Release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the CDS 2.0 Release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.

CDS Powered by Apache Spark Version Supported CDH Versions
2.4 Release 2 CDH 5.10 and any higher CDH 5.x versions
2.4 Release 1 CDH 5.10 and any higher CDH 5.x versions
2.3 Release 4 CDH 5.9 and any higher CDH 5.x versions
2.3 Release 3
2.3 Release 2
2.3 Release 1 Not released due to late bug discovered. If downloaded do not use.
2.2 Release 4 CDH 5.8 and any higher CDH 5.x versions
2.2 Release 3
2.2 Release 2
2.2 Release 1 CDH 5.8 - CDH 5.13
2.1 Release 4 CDH 5.7 and any higher CDH 5.x versions
2.1 Release 3 CDH 5.7 and any higher CDH 5.x versions
2.1 Release 2 CDH 5.7 and any higher CDH 5.x versions
2.1 Release 1 CDH 5.7 - CDH 5.12
2.0 Release 2 CDH 5.7 - 5.11
2.0 Release 1 CDH 5.7 up to 5.7.5, CDH 5.8 up to 5.8.4, CDH 5.9 up to 5.9.1, CDH 5.10.0.

CDS 2.0 Release 2 is required for any higher maintenance releases of these CDH versions.

A Spark 1.6 service (included in CDH 5.7 and higher) can co-exist on the same cluster as Spark 2 (installed as a separate parcel). The two services are configured to not conflict, and both run on the same YARN service. Spark 2 uses the external shuffle service from the CDH installation if Spark 1 is already installed, or installs the shuffle service itself if necessary. Only the external shuffle service classes from the CDH installation can be used.

Although Spark 1 and Spark 2 can coexist in the same CDH cluster, you cannot use multiple Spark 2 versions simultaneously in the same Cloudera Manager instance. All CDH clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark. For example, you cannot use the built-in CDH Spark service, a CDS 2.1 service, and a CDS 2.2 service. You must choose only one CDS 2 Powered by Apache Spark release. Make sure to install or upgrade the CDS 2 service descriptor and parcels across all machines of all clusters at the same time.

Cloudera Manager Versions

Applicable versions of Cloudera Manager for CDS Powered by Apache Spark are described below.

CDS Powered by Apache Spark Version Supported Cloudera Manager Versions
2.4 Release 2 Cloudera Manager 5.11 and any higher Cloudera Manager 5.x versions
2.4 Release 1
2.3 Release 4
2.3 Release 3
2.3 Release 2
2.3 Release 1 Never officially released; if downloaded, do not use
2.2 Release 4 Cloudera Manager 5.8.3, 5.9 and any higher Cloudera Manager 5.x versions
2.2 Release 3
2.2 Release 2
2.2 Release 1
2.1 Release 4
2.1 Release 3
2.1 Release 2
2.1 Release 1
2.0 Release 2
2.0 Release 1

Scala 2.11 Requirement

Spark 2 does not work with Scala 2.10. Use Scala 2.11 only.

Python Requirement

CDS Powered by Apache Spark requires one of the following Python versions:

  • Python 2.7 or higher, when using Python 2.
  • Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher.)

JDK 8 Requirement

CDS 2.2 and higher require JDK 8 only. If you are using CD 2.2 or higher, you must remove JDK 7 from all cluster and gateway hosts to ensure proper operation.

Check the supported JDK versions and see Java Development Kit Installation for the installation steps.