CDS 3.3 Powered by Apache Spark Requirements

The following sections describe software requirements for CDS 3.3 Powered by Apache Spark.

CDP Versions

Supported versions of CDP are described below.

CDS Powered by Apache Spark Version Supported CDP Versions
3.3.0.3.3.7180.0-274 CDP Private Cloud Base with Cloudera Runtime 7.1.8

A Spark 2 service (included in CDP) can co-exist on the same cluster as Spark 3 (installed as a separate parcel). The two services are configured to not conflict, and both run on the same YARN service. Spark 3 installs and uses its own external shuffle service.

Although Spark 2 and Spark 3 can coexist in the same CDP Private Cloud Base cluster, you cannot use multiple Spark 3 versions simultaneously. All clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark.

Software requirements

Each cluster host must have the following software installed:

Java
JDK 8 or JDK 11. Cloudera recommends using JDK 8, as most testing has been done with JDK 8. Remove other JDK versions from all cluster and gateway hosts to ensure proper operation.
Python
Python 3.7 - 3.10

Each cluster host with a GPU must have the following software installed:

Java
JDK 8 or JDK 11. Cloudera recommends using JDK 8, as most testing has been done with JDK 8. Remove other JDK versions from all cluster and gateway hosts to ensure proper operation.
Python
Python 3.7 - 3.10
GPU drivers and CUDA toolkit

GPU driver v450.80.02 or higher

CUDA version 11.0 or higher

Download and install the CUDA Toolkit for your operating system. The toolkit installer also provides the option to install the GPU driver.

NVIDIA Library
NVIDIA RAPIDS version 22.06. For more information, see NVIDIA Release Notes
UCX (Optional)

Clusters with Infiniband or RoCE networking can leverage Unified Communication X (UCX) to enable the RAPIDS Shuffle Manager. For information on UCX native libraries support, see (Optional) Installing UCX native libraries.

Hardware requirements

CDS 3.3 Powered by Apache Spark has no specific hardware requirements on top of what is required for Cloudera Runtime deployments.

CDS 3.3 with GPU Support requires cluster hosts with NVIDIA Pascal™or better GPUs, with a compute capability rating of 6.0 or higher.

For more information, see Getting Started at the RAPIDS website.

Cloudera and NVIDIA recommend using NVIDIA-certified systems. For more information, see NVIDIA-Certified Systems in the NVIDIA GPU Cloud documentation.