ML Runtimes versus the Legacy Engine
Runtimes and the Legacy Engine serve the same basic goal: they are container images that contain a complete Linux OS, interpreter(s), and libraries. They are the environment in which your code runs.
Runtimes and the Legacy Engine serve the same basic goal: they are container images that contain a complete Linux OS, interpreter(s), and libraries. They are the environment in which your code runs. However, ML Runtimes design keeps the images small, which improves performance, maintenance, and security.
There is one Legacy Engine. The Engine is monolithic. It contains the machinery necessary to run sessions using all four Engine interpreter options that Cloudera currently supports (Python 2, Python 3, R, and Scala) and a much larger set of UNIX tools including LaTeX.
Runtimes are the future of CML. There are many Runtimes. Currently each Runtime contains a single interpreter (for example, Python 3.8, R 4.0) and a set of UNIX tools including gcc. Each Runtime supports a single UI for running code (for example, the Workbench or JupyterLab).
To migrate from Legacy Engine to Runtimes, you'll need to modify your project settings. See Modifying Project Settings for more information.
Jupyter
Our Python Runtimes support JupyterLab, a general purpose IDE from the Jupyter project. The engine supports Jupyter Notebook, a simpler UI focused on Notebooks. If you prefer the simpler Notebook UI, choose Classic Notebook from the JupyterLab Help menu. To further customize the JupyterLab experience on CML see Using Editors for ML Runtimes.
Build dependencies
Runtimes generally include fewer UNIX tools than the Legacy Engine. This means you are more likely to find that you cannot install a Python or R package because the Runtime is missing a build dependency such as a library. This should not happen often with Python. Most Python packages are distributed as precompiled “wheels”, so there are no build dependencies. It is more likely to happen with R packages because precompiled packages are not available for our architecture. We have tried to cover most common use cases, but if you find you cannot build something, then please contact customer support.
Using pip to install libraries in Python
To install a Python library from within Workbench or JupyterLab we recommend you use
%pip
(for example, %pip install sklearn
.
%pip
is a “magic” command that is guaranteed to point to the right version
of pip. This is a good habit to get into, as it will work outside CML. Note you do not need to
add “3” to install a Python 3 library.
If you prefer to use the pip executable directly, both pip
and
pip3
work. This is because Runtimes do not include Python 2. Like any shell
command, precede it with “!” to run it from within Workbench or JupyterLab (for example,
!pip install sklearn
. In the Legacy Engine you must use
pip3
to install Python 3 packages and the %pip
magic
command is not supported.
Python paths
Python Runtimes include preinstalled Python packages at
/usr/local/lib/python/<version>/site-packages
. The pre-installed
packages and versions are documented in Pre-Installed Packages in ML
Runtimes.
When you use pip, you install packages into the current project (not a runtime image) at
/home/cdsw/.local/lib/python/<version>/site-packages
. This means you
need to reinstall packages if you change Python versions.
In most cases, you can install a newer version of a package preinstalled in
/usr/local
into your project. For example, we preinstall numpy and you can
install a newer version. But there are some exceptions to this: if you install
matplotlib
, ipykernel
, or its dependencies
(ipython
, traitlets
, jupyter_client
, and
tornado
) then you may break your ability to launch sessions.
If you accidentally install these packages (or you see unexpected behavior when you switch a
project from Legacy Engine to Runtimes), the simplest solution is to delete
/home/cdsw/.local/lib/python
and reinstall your project’s dependencies from
the project overview page.
R paths
R Runtimes include preinstalled R packages at /usr/local/lib/R/library/
. The
pre-installed packages and versions are documented in Pre-Installed Packages in ML
Runtimes.
When you use install.packages()
, you install packages into the current
project (not a runtime image) at /home/cdsw/.local/lib/R/<version>/library
(for example, $R_LIBS_USER
). This means you need to reinstall packages if you
change R versions.
Note the R project package path in Legacy Engines. If you use engines, you install packages
to /home/cdsw/R
. The change to
/home/cdsw/.local/lib/R/<version>/library
was made to support multiple
versions of R.
In most cases, you can install a newer version of a package preinstalled
/usr/local
into your project. For example, we preinstall
ggplot2
and you can install a newer version. But there are two exceptions
to this. If you install Cairo or RServe they may break your ability to launch sessions.
If you accidentally install these packages (or you see unexpected behavior when you switch a
project from Legacy Engine to Runtimes), the simplest solution is to delete
/home/cdsw/.local/lib/python
and reinstall your project’s dependencies from
the project overview page.