CDS Powered by Apache Spark Requirements
The following sections describe software requirements for CDS Powered by Apache Spark.
Continue reading:
CDH Versions
Supported versions of CDH are described below.
A Hive compatibility issue in CDS 2.0 Release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to CDS 2.0 Release 2 or higher to avoid Spark 2 job failures when using Hive functionality.
CDS Powered by Apache Spark Version | CDH Version |
---|---|
2.1 Release 4 | CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions |
2.1 Release 3 | CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions |
2.1 Release 2 | CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions |
2.1 Release 1 | CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12 |
2.0 Release 2 | CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11 |
2.0 Release 1 | CDH 5.7 up to 5.7.5, CDH 5.8 up to 5.8.4, CDH 5.9 up to 5.9.1, CDH 5.10.0. Spark 2.0 Release 2 is required for any higher maintenance releases in any of these CDH versions. |
A Spark 1.6 service (included in CDH 5.7 and higher) can co-exist on the same cluster as Spark 2 (installed as a separate parcel). The two services are configured to not conflict, and both run on the same YARN service. Spark 2 uses the external shuffle service from the CDH installation if Spark 1 is already installed, or installs the shuffle service itself if necessary. Only the external shuffle service classes from the CDH installation can be used.
Although Spark 1 and Spark 2 can coexist in the same CDH cluster, you cannot use multiple Spark 2 versions simultaneously in the same Cloudera Manager instance. All CDH clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark. For example, you cannot use the built-in CDH Spark service, a CDS 2.0 service, and a CDS 2.1 service. You must choose only one CDS 2 Powered by Apache Spark release. Make sure to install or upgrade the CDS 2 service descriptor and parcels across all machines of all clusters at the same time.
Cloudera Manager Versions
Applicable versions of Cloudera Manager for Spark 2 are described below.
CDS Powered by Apache Spark Version | Cloudera Manager Version |
---|---|
2.1 Release 4 | Cloudera Manager 5.8.3, 5.9 and higher |
2.1 Release 3 | Cloudera Manager 5.8.3, 5.9 and higher |
2.1 Release 2 | Cloudera Manager 5.8.3, 5.9 and higher |
2.1 Release 1 | Cloudera Manager 5.8.3, 5.9 and higher |
2.0 Release 2 | Cloudera Manager 5.8.3, 5.9 and higher |
2.0 Release 1 | Cloudera Manager 5.8.3, 5.9 and higher |
Scala 2.11 Requirement
Spark 2 does not work with Scala 2.10. Use Scala 2.11 only.
Python Requirement
CDS Powered by Apache Spark requires one of the following Python versions:
- Python 2.7 or higher, when using Python 2.
- Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 includes support for Python 3.6 and higher.)