Key Points to Note

Cloudera Data Science Workbench does not support heterogeneous GPU hardware in a single deployment.

Cloudera Data Science Workbench does not install or configure the NVIDIA drivers on the Cloudera Data Science Workbench worker hosts. These depend on your GPU hardware and will have to be installed by your system administrator. The steps provided in this topic are generic guidelines that will help you evaluate your setup.

The instructions described in this topic require Internet access. If you have an air-gapped deployment, you will be required to manually download and load the resources onto your hosts.

Cloudera Data Science Workbench 1.9.0 or later provides two options for supporting GPUs:

Support for Nvidia was introduced with ML Runtimes 2021.02. Airgapped environments will have access to ML Runtimes 2021.02 in the upcoming CDSW 1.10 release. See the documentation on the ML Runtimes Nvidia GPU Edition.
Cloudera Data Science Workbench still provides technical preview support for CUDA-enabled engines. However, CDSW does not include an engine image that supports Nvidia libraries. You must create your own custom CUDA-capable engine image using the instructions provided in Create a Custom CUDA-capable Engine Image.
note
CUDA-enabled engines will be deprecated with CDSW 1.10 and we recommend using ML Runtimes for GPU support.

For a list of known issues associated with this feature, refer Known Issues - GPU Support and ML Runtimes Release Notes

GPU nodes cannot be split among different workloads such as user sessions, jobs, experiments, or models. For example, a 10 GPU node can support 10 different 1-GPU workloads (assuming it can support the CPU and memory requirements). However, the workload must be scheduled completely on the GPU node; it cannot be on a CPU enabled node and then borrow a GPU from a GPU node when needed.

When scheduling normal workloads, GPU nodes are de-prioritized.

Cloudera Data Science Workbench uses the following order of preference when scheduling non-GPU workloads (session, job, experiment, or model):

Worker Hosts > Master Host > GPU-equipped Hosts | Labeled Auxiliary Hosts

If RESERVE_MASTER is set to true, then the master host is not available for scheduling, so the order of preference becomes:

Worker Hosts > GPU-equipped Hosts | Labeled Auxiliary Hosts

When selecting a host to schedule an engine, Cloudera Data Science Workbench gives first preference to unlabeled Worker hosts. If Workers are unavailable or at capacity, CDSW leverages the Master host. And finally, any GPU-equipped hosts OR labeled auxiliary hosts are leveraged.

GPU-equipped hosts are labeled auxiliary by default so as to reserve them for GPU-intensive workloads. They do not need to be explicitly configured to be labeled. A GPU-equipped host and a labeled auxiliary host are given equal priority when scheduling workloads.