Key Points to Note
Cloudera Data Science Workbench only supports CUDA-enabled NVIDIA GPU cards.
-
Cloudera Data Science Workbench does not support heterogeneous GPU hardware in a single deployment.
-
Cloudera Data Science Workbench does not install or configure the NVIDIA drivers on the Cloudera Data Science Workbench worker hosts. These depend on your GPU hardware and will have to be installed by your system administrator. The steps provided in this topic are generic guidelines that will help you evaluate your setup.
-
The instructions described in this topic require Internet access. If you have an air-gapped deployment, you will be required to manually download and load the resources onto your hosts.
- Cloudera Data Science Workbench 1.9.0 or later provides two options for supporting
GPUs:
- Support for Nvidia was introduced with ML Runtimes 2021.02. Airgapped environments will have access to ML Runtimes 2021.02 in the upcoming CDSW 1.10 release. See the documentation on the ML Runtimes Nvidia GPU Edition.
- Cloudera Data Science Workbench still provides technical preview support for CUDA-enabled engines. However, CDSW does not include an engine image that supports Nvidia libraries. You must create your own custom CUDA-capable engine image using the instructions provided in Create a Custom CUDA-capable Engine Image.
- For a list of known issues associated with this feature, refer Known Issues - GPU Support and ML Runtimes Release Notes
- GPU nodes cannot be split among different workloads such as user sessions, jobs, experiments, or models. For example, a 10 GPU node can support 10 different 1-GPU workloads (assuming it can support the CPU and memory requirements). However, the workload must be scheduled completely on the GPU node; it cannot be on a CPU enabled node and then borrow a GPU from a GPU node when needed.
-
When scheduling normal workloads, GPU nodes are de-prioritized.
Cloudera Data Science Workbench uses the following order of preference when scheduling non-GPU workloads (session, job, experiment, or model):
Worker Hosts > Master Host > GPU-equipped Hosts | Labeled Auxiliary Hosts
If
RESERVE_MASTER
is set totrue
, then the master host is not available for scheduling, so the order of preference becomes:Worker Hosts > GPU-equipped Hosts | Labeled Auxiliary Hosts
When selecting a host to schedule an engine, Cloudera Data Science Workbench gives first preference to unlabeled Worker hosts. If Workers are unavailable or at capacity, CDSW leverages the Master host. And finally, any GPU-equipped hosts OR labeled auxiliary hosts are leveraged.
GPU-equipped hosts are labeled auxiliary by default so as to reserve them for GPU-intensive workloads. They do not need to be explicitly configured to be labeled. A GPU-equipped host and a labeled auxiliary host are given equal priority when scheduling workloads.