CUDA Engine - Technical Preview

To make it easier for users to get started with using GPUs on CDSW, the Cloudera engineering team is working on a new custom engine that comes enabled with CUDA out of the box. Previously, users were expected to build their own CUDA engine.

Compatibility Information

The first version of this engine is built on top of CDSW base engine:10 and ships with CUDA 10.1. It has been tested with CDSW 1.7.x.
Custom Engine CDSW NVIDIA Driver

cuda-engine:10

- Based on CDSW engine:10

- Ships CUDA 10.1

1.7.x

418.39 (or higher)

NVIDIA/CUDA compatibility matrix

Enabling GPUs for CDSW with the CUDA engine

This section gives you a modified set of steps to be used when you want to use the CUDA engine to enable GPUs for CDSW workloads. If you already have

  1. Set up the CDSW hosts
  2. Site Admins: Add the CUDA Engine to your Cloudera Data Science Workbench Deployment
  3. Project Admins: Enable the CUDA Engine for your Project
  4. Test the CUDA Engine

Set up the CDSW hosts

The first few steps of this process are the same as that required for the traditional GPU set up process. If you have already performed these steps for an existing project that uses GPUs, you can move on to the next section: Site Admins: Add the Custom CUDA Engine to your Cloudera Data Science Workbench Deployment

  1. Set Up the Operating System and Kernel
  2. Install the NVIDIA Driver on GPU Hosts - Make sure you are using a driver that is compatible with the CUDA engine.
  3. Enable GPU Support in Cloudera Data Science Workbench

Site Admins: Add the CUDA Engine to your Cloudera Data Science Workbench Deployment

Required CDSW Role: Site Administrator

After you've created the CUDA engine, a site administrator must add this new engine to Cloudera Data Science Workbench.
  1. Sign in to Cloudera Data Science Workbench.
  2. Click Admin.
  3. Go to the Engines tab.
  4. Under Engine Images, add docker.repository.cloudera.com/cdsw/cuda-engine:10 to the list of images.
  5. Click Update.

Airgapped CDSW Deployments: Once these steps have been performed, the CUDA engine will be pulled from Cloudera's public Docker registry. If you have an airgapped deployment, you will also need to manually distribute this image to every CDSW host. For sample steps, see Distribute the Image.

Project Admins: Enable the CUDA Engine for your Project

Project administrators can use the following steps to make it the CUDA engine the default engine used for workloads within a particular project.

  1. Navigate to your project's Overview page.
  2. Click Settings.
  3. Go to the Engines tab.
  4. Under Engine Image, select the CUDA-capable engine image from the dropdown.

Test the CUDA Engine

You can use the following simple examples to test whether the new CUDA engine is able to leverage GPUs as expected.

  1. Go to a project that is using the CUDA engine and click Open Workbench.
  2. Launch a new session with GPUs.
  3. Run the following command in the workbench command prompt to verify that the driver was installed correctly:
    ! /usr/bin/nvidia-smi
  4. Use any of the following code samples to confirm that the new engine works with common deep learning libraries.

    Pytorch

    !pip3 install torch
    from torch import cuda
    assert cuda.is_available()
    assert cuda.device_count() > 0
    print(cuda.get_device_name(cuda.current_device()))

    Tensorflow

    !pip3 install tensorflow-gpu==2.1.0
    from tensorflow.python.client import device_lib
    assert 'GPU' in str(device_lib.list_local_devices())
    device_lib.list_local_devices()

    Keras

    !pip3 install keras
    from keras import backend
    assert len(backend.tensorflow_backend._get_available_gpus()) > 0
    print(backend.tensorflow_backend._get_available_gpus())