Creating Extended Engine Images

Cloudera Data Science Workbench allows site administrators and project administrators to install libraries and add other dependencies to the base Docker image that ships with Cloudera Data Science Workbench.

To do this, you will need to build a new custom engine image with the libraries you require, using the Cloudera Data Science Workbench engine as the base image. Site administrators can then whitelist the new image for use in projects, and project administrators can allow the new white-listed image to be used for their projects. For a complete example, see Example: MeCab.

Related Resources:

Limitations

  • Cloudera Data Science Workbench only supports custom extended engines that are based on the Cloudera Data Science Workbench base image.

  • Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.

  • Cloudera Data Science Workbench does not support creation of custom engines larger than 10 GB.

    For the complete list, see Known Issues and Limitations: Engines.

Example: MeCab

This section demonstrates how to extend the Cloudera Data Science Workbench base engine image to include the MeCab (a Japanese text tokenizer) library.

This is a sample Dockerfile that adds MeCab to the Cloudera Data Science Workbench base image.

# Dockerfile

FROM docker.repository.cloudera.com/cdsw/engine:4
RUN apt-get update && \
    apt-get install -y -q mecab \
                          libmecab-dev \
                          mecab-ipadic-utf8 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
RUN cd /tmp && \
    git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git && \
    /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -y -n -p /var/lib/mecab/dic/neologd && \
    rm -rf /tmp/mecab-ipadic-neologd
RUN pip install --upgrade pip
RUN pip install mecab-python==0.996
To use this image on your Cloudera Data Science Workbench project, perform the following steps.
  1. Build a new image with the Dockerfile.
    docker build --network=host -t <company-registry>/user/cdsw-mecab:latest . -f Dockerfile
  2. Push the image to your company's Docker registry.
    docker push <company-registry>/user/cdsw-mecab:latest
  3. Whitelist the image, <company-registry>/user/cdsw-mecab:latest. Only a site administrator can do this.
    1. Log in as a site administrator.
    2. Click Admin.
    3. Go to the Engines tab.
    4. Add <company-registry>/user/cdsw-mecab:latest to the list of whitelisted engine images.
  4. Make the whitelisted image available to your project. Only a project administrator can do this.
    1. Go to the project Settings page.
    2. Click Engines.
    3. Select company-registry/user/cdsw-mecab:latest from the dropdown list of available Docker images. Sessions and jobs you run in your project will now have access to this custom image.