End-to-End Example: MeCab
This section demonstrates how to customize the Cloudera Data Science Workbench base engine image to include the MeCab (a Japanese text tokenizer) library.
# Dockerfile FROM docker.repository.cloudera.com/cdsw/engine:8 RUN rm /etc/apt/sources.list.d/* RUN apt-get update && \ apt-get install -y -q mecab \ libmecab-dev \ mecab-ipadic-utf8 && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN cd /tmp && \ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git && \ /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -y -n -p /var/lib/mecab/dic/neologd && \ rm -rf /tmp/mecab-ipadic-neologd RUN pip install --upgrade pip RUN pip install mecab-python==0.996
To use this image on your Cloudera Data Science Workbench project, perform the following steps.