Using CDV in air-gapped CDSW deployment

New Cloudera Data Visualization (CDV) Runtime releases are automatically added to your deployment when an internet connection is available. However, in air-gapped Cloudera Data Science Workbench (CDSW) deployments, if you want to use CDV, you need to manually load the CDV Visualization Runtime in the cluster.

Ensure you have the following:
  • CDSW 1.9.0 or higher to support runtimes
  • Root access to all cluster nodes
  • CDSW installed as a parcel
  • Admin access on the CDSW cluster
  • Proficiency in Docker, SQL, and Linux
  1. Download the repo-assembly.json file from the ‘artifacts’ directory of the latest CDV version.
  2. Download the Docker image to an Internet-connected node or your local machine.
    image_identifier=$(jq -r '.runtimedataviz[0].image_identifier' repo-assembly.json)
    docker pull "${image_identifier}"
  3. Load the Docker image to all cluster nodes (master and all workers) using the docker save and docker load commands.
    docker save -o runtimedataviz.tar <image_name>
    docker load -i runtimedataviz:image_identifier
  4. Verify that the Docker image is available on all nodes and has its original name and tag using the docker images command.
    You will get a summary view of the Docker images, showing details such as the repository name, tag, image ID, creation date, and size.
  5. Add the Cloudera Data Visualization image as a custom runtime, using the original Docker image name.

    For example: docker.repository.cloudera.com/cloudera/cdv/runtimedataviz:7.1.2-b53

    For detailed instructions, see Adding New ML Runtimes.