Using Conda to Manage Dependencies

You can install additional libraries and packages from the workbench, using either the command prompt or the terminal. Alternatively, you might choose to use a package manager such as Conda to install and maintain packages and their dependencies. This topic describes some basic usage guidelines for Conda.

Cloudera Machine Learning recommends using pip for package management along with a requirements.txt file (as described in the previous section). However, for users that prefer Conda, the default engine in Cloudera Machine Learning includes two environments called python2.7, and python3.6. These environments are added to sys.path, depending on the version of Python selected when you launch a new session.

In Python 2 and Python 3 sessions and attached terminals, Cloudera Data Science Workbench automatically sets the CONDA_DEFAULT_ENV and CONDA_PREFIX environment variables to point to Conda environments under /home/cdsw/.conda.

However, Cloudera Machine Learning does not automatically configure Conda to pin the actual Python version. Therefore if you are using Conda to install a package, you must specify the version of Python. For example, to use Conda to install the feather-format package into the python3.6 environment, run the following command in the Workbench command prompt:
!conda install -y -c conda-forge python=3.6.1 feather-format
To install a package into the python2.7 environment, run:
!conda install -y -c conda-forge python=2.7.11 feather-format

Note that on sys.path, pip packages have precedence over conda packages.

Creating an Extensible Engine With Conda

Cloudera Data Science Workbench also allows you to Configuring the Engine Environment to include packages of your choice using Conda. To create an extended engine:
  1. Add the following lines to a Dockerfile to extend the base engine, push the engine image to your Docker registry, and whitelist the new engine for your project. For more details on this step, see Configuring the Engine Environment.
    Python 2
    RUN mkdir -p /opt/conda/envs/python2.7
    RUN conda install -y nbconvert python=2.7.11 -n python2.7
    Python 3
    RUN mkdir -p /opt/conda/envs/python3.6
    RUN conda install -y nbconvert python=3.6.1 -n python3.6
  2. Set the PYTHONPATH environmental variable as shown below. You can set this either globally in the site administrator dashboard, or for a specific project by going to the project's Settings > Engine page.
    Python 2
    PYTHONPATH=$PYTHONPATH:/opt/conda/envs/python2.7/lib/python2.7/site-packages
    Python 3
    PYTHONPATH=$PYTHONPATH:/opt/conda/envs/python3.6/lib/python3.6/site-packages