End-to-end example: MeCab
This topic walks you through a simple end-to-end example on how to build and use custom engines.
This section demonstrates how to customize the Cloudera Machine Learning base engine image to include the MeCab (a Japanese text tokenizer) library.
This is a sample Dockerfile that adds MeCab to the Cloudera Machine Learning base image.
# Dockerfile
FROM docker.repository.cloudera.com/cloudera/cdsw/engine:13-cml-2021.02-1
RUN rm /etc/apt/sources.list.d/*
RUN apt-get update && \
apt-get install -y -q mecab \
libmecab-dev \
mecab-ipadic-utf8 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN cd /tmp && \
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git && \
/tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -y -n -p /var/lib/mecab/dic/neologd && \
rm -rf /tmp/mecab-ipadic-neologd
RUN pip install --upgrade pip
RUN pip install mecab-python==0.996
To use this image on your Cloudera Machine Learning project,
perform the following steps.
- Build a new image with the
Dockerfile.
docker build --network=host -t <company-registry>/user/cdsw-mecab:latest . -f Dockerfile
- Push the image to your company's Docker
registry.
docker push <your-company-registry>/user/cdsw-mecab:latest
- Whitelist the image,
<your-company-registry>/user/cdsw-mecab:latest
. Only a site administrator can do this.Go to
and add<company-registry>/user/cdsw-mecab:latest
to the list of whitelisted engine images.
-
Ask a project administrator to set the new image as the default for your project. Go to the project Settings, click Engines, and select
company-registry/user/cdsw-mecab:latest
from the dropdown.
You shall now be able to run this project on the customized MeCab engine.