Frequently Asked Questions about CDS Powered by Apache Spark

This Frequently Asked Questions (FAQ) page covers general information about CDS Powered by Apache Spark, coexistence with Spark 1, and other questions that are relevant for early adopters of the latest Spark 2 features.

Running Spark 1 and Spark 2 Side-by-Side

The Spark 2 service does not conflict with Spark 1 if it is installed. The history server uses a different port. Spark 2 shares the Spark 1 shuffle service if already available, or installs the shuffle service if not.

Although Spark 1 and Spark 2 can coexist in the same CDH cluster, you cannot use multiple Spark 2 versions simultaneously in the same Cloudera Manager instance. All CDH clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark. For example, you cannot use the built-in CDH Spark service, a CDS 2.1 service, and a CDS 2.2 service. You must choose only one CDS 2 Powered by Apache Spark release. Make sure to install or upgrade the CDS 2 service descriptor and parcels across all machines of all clusters at the same time.

Why doesn't feature or library XYZ work?

A number of features, components, libraries, and integration points from Spark 1.6 are not supported with CDS Powered by Apache Spark. See CDS Powered by Apache Spark Known Issues for details.