Using Cloudera Data Visualization in air-gapped Cloudera Data Science Workbench deployment

New Cloudera Data Visualization Runtime releases are automatically added to your deployment when an internet connection is available. However, in air-gapped Cloudera Data Science Workbench deployments, if you want to use Cloudera Data Visualization, you need to manually load the specific Cloudera Data Visualization Runtime in the cluster.

Ensure you have the following:
  • Cloudera Data Science Workbench 1.9.0 or higher to support runtimes
  • Root access to all cluster nodes
  • Cloudera Data Science Workbench installed as a parcel
  • Admin access on the Cloudera Data Science Workbench cluster
  • Proficiency in Docker, SQL, and Linux
  1. Download the repo-assembly.json file from the ‘artifacts’ directory of the latest Cloudera Data Visualization version.
  2. Download the Docker image to an Internet-connected node or your local machine.
    image_identifier=$(jq -r '.runtimedataviz[0].image_identifier' repo-assembly.json)
    docker pull "${image_identifier}"
  3. Load the Docker image to all cluster nodes (master and all workers) using the docker save and docker load commands.
    docker save -o runtimedataviz.tar <image_name>
    docker load -i runtimedataviz:image_identifier
  4. Verify that the Docker image is available on all nodes and has its original name and tag using the docker images command.
    You will get a summary view of the Docker images, showing details such as the repository name, tag, image ID, creation date, and size.
  5. Add the Cloudera Data Visualization image as a custom runtime, using the original Docker image name.

    For example: docker.repository.cloudera.com/cloudera/cdv/runtimedataviz:7.1.2-b53

    For detailed instructions, see Adding New ML Runtimes.