Creating Extended Engine Images
Cloudera Data Science Workbench allows site administrators and project administrators to install libraries and add other dependencies to the base Docker image that ships with Cloudera Data Science Workbench.
To do this, you will need to build a new custom engine image with the libraries you require, using the Cloudera Data Science Workbench engine as the base image. Site administrators can then whitelist the new image for use in projects, and project administrators can allow the new white-listed image to be used for their projects. For a complete example, see Example: MeCab.
- This Cloudera Engineering Blog post on Customizing Docker Images in Cloudera Data Science Workbench describes an end-to-end example on how to build and publish a customized Docker image and use it as an engine in Cloudera Data Science Workbench.
- For an example of how to extend the base engine image to include Conda, see Creating an Extensible Engine With Conda.
Limitations
-
Cloudera Data Science Workbench only supports custom extended engines that are based on the Cloudera Data Science Workbench base image.
-
Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.
-
Cloudera Data Science Workbench does not support creation of custom engines larger than 10 GB.
For the complete list, see Known Issues and Limitations: Engines.
Example: MeCab
This section demonstrates how to extend the Cloudera Data Science Workbench base engine image to include the MeCab (a Japanese text tokenizer) library.
This is a sample Dockerfile that adds MeCab to the Cloudera Data Science Workbench base image.
# Dockerfile FROM docker.repository.cloudera.com/cdsw/engine:4 RUN apt-get update && \ apt-get install -y -q mecab \ libmecab-dev \ mecab-ipadic-utf8 && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN cd /tmp && \ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git && \ /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -y -n -p /var/lib/mecab/dic/neologd && \ rm -rf /tmp/mecab-ipadic-neologd RUN pip install --upgrade pip RUN pip install mecab-python==0.996
- Build a new image with the Dockerfile.
docker build --network=host -t <company-registry>/user/cdsw-mecab:latest . -f Dockerfile
- Push the image to your company's Docker registry.
docker push <company-registry>/user/cdsw-mecab:latest
- Whitelist the image, <company-registry>/user/cdsw-mecab:latest. Only a site administrator can do this.
- Log in as a site administrator.
- Click Admin.
- Go to the Engines tab.
- Add <company-registry>/user/cdsw-mecab:latest to the list of whitelisted engine images.
- Make the whitelisted image available to your project. Only a project administrator can do this.
- Go to the project Settings page.
- Click Engines.
- Select company-registry/user/cdsw-mecab:latest from the dropdown list of available Docker images. Sessions and jobs you run in your project will now have access to this custom image.