GPU nodes setup as worker nodes

You can add the GPU hardware to the existing or new Cloudera Embedded Container Service or OCP cluster as a worker node.

You must install the nvidia-container-toolkit on the worker node. For more on nvidia-container-runtime migration to nvidia-container-toolkit, see Migration Notice. For information about the installation, see NVIDIA Installation Guide. If you use Red Hat Enterprise Linux (RHEL), use dnf to install the package. For an example with RHEL 8.8, see Installing the NVIDIA Container Toolkit.

You can use the following options to advertise the GPUs in the Kubernetes cluster:

  • NVIDIA device plugin: In Cloudera Embedded Container Service installation, if the NVIDIA drivers are correctly installed, the NVIDIA device plugin automatically advertises the GPU resource to the scheduler. The platform administrator does not need to deploy the NVIDIA device plugin.

  • Node Feature Discovery Operator (NFD) and GPU Operator: OpenShift Container Platform administrators must install NFD and GPU Operator for advertising the GPU resource to the Kubernetes scheduler.

If the NVIDIA drivers are correctly installed, the above options advertise the GPU resource to the scheduler. For more information, see NVIDIA Device Plugin documentation.