CDS 3.3 Powered by Apache Spark Requirements
The following sections describe software requirements for CDS 3.3 Powered by Apache Spark.
CDP Versions
Supported versions of CDP are described below.
CDS Powered by Apache Spark Version | Supported CDP Versions |
---|---|
3.3.0.3.3.7180.0-274 | CDP Private Cloud Base with Cloudera Runtime 7.1.8 |
A Spark 2 service (included in CDP) can co-exist on the same cluster as Spark 3 (installed as a separate parcel). The two services are configured to not conflict, and both run on the same YARN service. Spark 3 installs and uses its own external shuffle service.
Although Spark 2 and Spark 3 can coexist in the same CDP Private Cloud Base cluster, you cannot use multiple Spark 3 versions simultaneously. All clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark.
Software requirements
Each cluster host must have the following software installed:
- Java
- JDK 8 or JDK 11. Cloudera recommends using JDK 8, as most testing has been done with JDK 8. Remove other JDK versions from all cluster and gateway hosts to ensure proper operation.
- Python
- Python 3.7 - 3.10
Each cluster host with a GPU must have the following software installed:
- Java
- JDK 8 or JDK 11. Cloudera recommends using JDK 8, as most testing has been done with JDK 8. Remove other JDK versions from all cluster and gateway hosts to ensure proper operation.
- Python
- Python 3.7 - 3.10
- GPU drivers and CUDA toolkit
-
GPU driver v450.80.02 or higher
CUDA version 11.0 or higher
Download and install the CUDA Toolkit for your operating system. The toolkit installer also provides the option to install the GPU driver.
- NVIDIA Library
- NVIDIA RAPIDS version 22.06. For more information, see NVIDIA Release Notes
- UCX (Optional)
-
Clusters with Infiniband or RoCE networking can leverage Unified Communication X (UCX) to enable the RAPIDS Shuffle Manager. For information on UCX native libraries support, see (Optional) Installing UCX native libraries.
Hardware requirements
CDS 3.3 Powered by Apache Spark has no specific hardware requirements on top of what is required for Cloudera Runtime deployments.
CDS 3.3 with GPU Support requires cluster hosts with NVIDIA Pascal™or better GPUs, with a compute capability rating of 6.0 or higher.
For more information, see Getting Started at the RAPIDS website.
Cloudera and NVIDIA recommend using NVIDIA-certified systems. For more information, see NVIDIA-Certified Systems in the NVIDIA GPU Cloud documentation.