CDS 3 (Experimental) Powered by Apache Spark Overview

Apache Spark is a general framework for distributed computing that offers high performance for both batch and interactive processing. It exposes APIs for Java, Python, and Scala.

For detailed API information, see the Apache Spark project site.

CDS Powered by Apache Spark is an add-on service for CDP, distributed as a parcel and custom service descriptor.

This document describes CDS 3.0 (Experimental) Powered by Apache Spark. It enables you to install and evaluate the features of Apache Spark 3 without upgrading your CDP Data Center cluster.

On CDP Data Center, a Spark 3 service can coexist with the existing Spark 2 service. The configurations of the two services do not conflict and both services use the same YARN service. The port of the Spark History Server is 18088 for Spark 2 and 18089 for Spark 3.

Unsupported Features:

This experimental release does not support the following features:

  • Hive Warehouse Connector
  • Kudu
  • HBase Connector
  • Oozie
  • Livy
  • Zeppelin