Cloudera Data Science Workbench recommends using pip for package management along with
a requirements.txt
file (as described in the
previous section).
Cloudera Data Science Workbench also allows you to extend its base
engine image to include packages of your choice using Conda. To create an extended
engine:
-
Add the following lines to a Dockerfile to extend the base engine, push the engine
image to your Docker registry, and include the new engine in the allowlist for your
project. For more details on this step, see Extensible Engines.
Python 2
RUN mkdir -p /opt/conda/envs/python2.7
RUN conda install -y nbconvert python=2.7.11 -n python2.7
Python 3
RUN mkdir -p /opt/conda/envs/python3.6
RUN conda install -y nbconvert python=3.6.1 -n python3.6
-
Set the
PYTHONPATH
environmental variable as shown below. You can set
this either globally in the site administrator dashboard, or for a specific project by
going to the project's page.
Python 2
PYTHONPATH=$PYTHONPATH:/opt/conda/envs/python2.7/lib/python2.7/site-packages
Python 3
PYTHONPATH=$PYTHONPATH:/opt/conda/envs/python3.6/lib/python3.6/site-packages