Overview
Apache Spark is a general framework for distributed computing that offers high performance for both batch and interactive processing. It exposes APIs for Java, Python, and Scala. This document describes CDS (Cloudera Distribution of Spark), which enables you to install and evaluate the Apache Spark 3.x without upgrading your CDP Private Cloud Base cluster.
CDS (Cloudera Distribution of Spark) is an add-on service for CDP Private Cloud Base, distributed as a parcel and the Cloudera Service Descriptor file is available in Cloudera Manager for CDP.
On CDP Private Cloud Base, a Spark 3 service can coexist with the existing Spark 2 service. The configurations of the two services do not conflict and both services use the same YARN service. The port of the Spark History Server is 18088 for Spark 2 and 18089 for Spark 3.
For more information on CDS 3.5, refer to CDS 3.5 Powered by Apache Spark Overview.
For more information on CDS 3.3, refer to CDS 3.3 Powered by Apache Spark Overview.