Install the NVIDIA Driver on GPU Hosts
Cloudera Data Science Workbench does not ship with any of the NVIDIA drivers needed to enable GPUs for general purpose processing. System administrators are expected to install the version of the drivers that are compatible with the CUDA libraries that will be consumed on each host.
Stop the CDSW service. Login to Cloudera Manager, navigate to the CDSW service,
.The CUDA program actively references the service, so if it is not stopped, the following error might occur during installation:
ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.
Use the NVIDIA UNIX Driver archive to find out which driver
is compatible with your GPU card and operating system.
To download and install the NVIDIA driver, make sure you follow the instructions on the respective driver's download page. . It is crucial that you download the correct version.For example, if you use the
.runfile method (Linux 64 bit), you would download and install the driver as follows:
wget http://us.download.nvidia.com/.../NVIDIA-Linux-x86_64-<driver_version>.run export NVIDIA_DRIVER_VERSION=<driver_version> chmod 755 ./NVIDIA-Linux-x86_64-$NVIDIA_DRIVER_VERSION.run ./NVIDIA-Linux-x86_64-$NVIDIA_DRIVER_VERSION.run -asq
Once the installation is complete, run the following command to verify that the
driver was installed correctly:
Cloudera recommends installing the Nvidia Container Toolkit to better leverage
GPUs in your system.
Follow the instructions found on NVIDIA's website. Even without this toolkit installed, most GPU-based workloads will run as expected. However some GPU functionalities, for example, running
nvidia-smiwithin a GPU enabled workload, need this toolkit to be installed.
Start CDSW. Login to Cloudera Manager, navigate to the CDSW service, and select
.Although CDSW starts running at this point, it can take additional time (for example, 20 minutes) for all CDSW processes to start running.