Cloudera Data Science Workbench does not ship with any of the NVIDIA drivers needed
to enable GPUs for general purpose processing. System administrators are expected to install
the version of the drivers that are compatible with the CUDA libraries that will be consumed
on each host.
Perform this step on all hosts with GPU hardware installed on them.
-
Stop the CDSW service. Login to Cloudera Manager, navigate to the CDSW service,
and select .
The CUDA program actively references the service, so if it is not stopped, the
following error might occur during installation: ERROR: An NVIDIA kernel
module 'nvidia-drm' appears to already be loaded in your
kernel.
-
Use the NVIDIA UNIX Driver archive to find out which driver
is compatible with your GPU card and operating system.
To download and install the NVIDIA driver, make sure you
follow the instructions on the respective driver's download page.
. It is crucial that you download the correct version.
For example, if you use the .run
file method (Linux 64 bit),
you would download and install the driver as follows:
wget http://us.download.nvidia.com/.../NVIDIA-Linux-x86_64-<driver_version>.run
export NVIDIA_DRIVER_VERSION=<driver_version>
chmod 755 ./NVIDIA-Linux-x86_64-$NVIDIA_DRIVER_VERSION.run
./NVIDIA-Linux-x86_64-$NVIDIA_DRIVER_VERSION.run -asq
-
Once the installation is complete, run the following command to verify that the
driver was installed correctly:
-
Cloudera recommends installing the Nvidia Container Toolkit to better leverage
GPUs in your system.
Follow the instructions found on NVIDIA's website. Even without this toolkit
installed, most GPU-based workloads will run as expected. However some GPU
functionalities, for example, running nvidia-smi
within a GPU
enabled workload, need this toolkit to be installed.
-
Start CDSW. Login to Cloudera Manager, navigate to the CDSW service, and select .
Although CDSW starts running at this point, it can take additional time (for
example, 20 minutes) for all CDSW processes to start running.