ML Runtimes versus Legacy Engine
While Runtimes and the Legacy Engine are both container images that contain the Linux OS, interpreter(s), and libraries, ML Runtimes keeps the images small and improves performance, maintenance, and security.
Runtimes and the Legacy Engine serve the same basic goal: they are container images that contain a complete Linux OS, interpreter(s), and libraries. They are the environment in which your code runs. However, ML Runtimes design keeps the images small, which improves performance, maintenance, and security.
There is one Legacy Engine. The Engine is monolithic. It contains the machinery necessary to run sessions using all four Engine interpreter options that Cloudera currently supports (Python 2, Python 3, R, and Scala) and a much larger set of UNIX tools including LaTeX. The Conda package manager was available in the Legacy Engine. Conda is not available in ML Runtimes.
Runtimes are the future of Cloudera Machine Learning. There are many Runtimes. Currently each Runtime contains a single interpreter (for example, Python 3.8, R 4.0) and a set of UNIX tools including gcc. Each Runtime supports a single UI for running code (for example, the Workbench or JupyterLab).
To migrate from Legacy Engine to Runtimes, you will need to modify your project settings. See Modifying Project Settings for more information.
Disable Engines
Starting with version 1.5.1, Legacy Engines are disabled by default on the workspace level. This setting is accessible in . Select this checkbox on upgraded workspaces to disable Legacy Engines and change all existing project types to ML Runtimes. Disabling Legacy Engines prohibits changing the project type to Legacy Engine.
This has the following effect for workloads using Legacy Engine kernels:
- Model builds cannot be redeployed
- Applications cannot be restarted
- Jobs cannot be started, scheduled jobs will fail
- Sessions cannot be started
To re-enable Legacy Engines go to Disable Engines checkbox under Engine Images.
and deselect theJupyter
Our Python Runtimes support JupyterLab, a general purpose IDE from the Jupyter project. The engine supports Jupyter Notebook, a simpler UI focused on Notebooks. If you prefer the simpler Notebook UI, choose Classic Notebook from the JupyterLab Help menu. To further customize the JupyterLab experience on Cloudera Machine Learning see Using Editors for ML Runtimes.
Build dependencies
Runtimes generally include fewer UNIX tools than the Legacy Engine. This means you are more likely to find that you cannot install a Python or R package because the Runtime is missing a build dependency such as a library. This should not happen often with Python. Most Python packages are distributed as precompiled “wheels”, so there are no build dependencies. It is more likely to happen with R packages because precompiled packages are not available for our architecture. In case of further difficulties with the build, contact customer support.
Using pip to install libraries in Python
%pip
(for example, %pip install sklearn
.
%pip
is a command that is guaranteed to point to the right version of
pip
. This is optimal to be used outside of Cloudera Machine Learning as well. If you prefer to use the pip
executable directly, both pip
and pip3
work. This is because Runtimes do not include Python 2. Like any
shell command, precede it with “!” to run it from within Workbench or JupyterLab (for example,
!pip install sklearn
. In the Legacy Engine you must use
pip3
to install Python 3 packages and the %pip
magic
command is not supported.
Python paths
Python Runtimes include preinstalled Python packages at
/usr/local/lib/python/<version>/site-packages
. The pre-installed
packages and versions are documented in Pre-Installed Packages in ML
Runtimes.
When you use pip, you install packages into the current project (not a runtime image) at
/home/cdsw/.local/lib/python/<version>/site-packages
. This means you
need to reinstall packages if you change Python versions.
In most cases, you can install a newer version of a package preinstalled in
/usr/local
into your project. For example, we preinstall numpy and you can
install a newer version. But there are some exceptions to this: if you install
matplotlib
, ipykernel
, or its dependencies
(ipython
, traitlets
, jupyter_client
, and
tornado
) then you may break your ability to launch sessions.
If you accidentally install these packages (or you see unexpected behavior when you switch a
project from Legacy Engine to Runtimes), the simplest solution is to delete
/home/cdsw/.local/lib/python
and reinstall your project’s dependencies from
the project overview page.
R paths
R Runtimes include preinstalled R packages at /usr/local/lib/R/library/
. The
pre-installed packages and versions are documented in Pre-Installed Packages in ML Runtimes.
When you use install.packages()
, you install packages into the current
project (not a runtime image) at /home/cdsw/.local/lib/R/<version>/library
(for example, $R_LIBS_USER
). This means you need to reinstall packages if you
change R versions.
Note the R project package path in Legacy Engines. If you use engines, you install packages
to /home/cdsw/R
. The change to
/home/cdsw/.local/lib/R/<version>/library
was made to support multiple
versions of R.
In most cases, you can install a newer version of a package preinstalled
/usr/local
into your project. For example, we preinstall
ggplot2
and you can install a newer version. But there are two exceptions
to this. If you install Cairo or RServe they may break your ability to launch sessions.
If you accidentally install these packages (or you see unexpected behavior when you switch a
project from Legacy Engine to Runtimes), the simplest solution is to delete
/home/cdsw/.local/lib/python
and reinstall your project’s dependencies from
the project overview page.