ML Runtimes versus Legacy Engines

Cloudera Data Science Workbench offers both legacy engines and Machine Learning Runtimes. Both legacy engines and ML Runtimes are Docker images and contain OS, interpreters, and libraries to run user code in sessions, jobs, experiments, models, and applications. However, there are significant differences between these choices.

Legacy engines are monolithic in the sense that they contain the machinery necessary to run sessions using all four interpreter options that CML currently supports (Python 2, Python 3, R and Scala) and other support utilities (C and Fortran compilers, LaTeX, etc.).

ML Runtimes are thinner and more lightweight than the current monolithic engines. Rather than supporting multiple programming languages in a single engine, each Runtime variant supports a single interpreter version and a subset of utilities and libraries to run the user’s code in Sessions, Jobs, Experiments, Models, or Applications.

While the end form factor (a docker image) remains the same for legacy engines and ML Runtimes, the build architecture and release process of ML Runtimes differs from legacy engines. Versioning and metadata helps make ML Runtimes content simpler to understand, both for the host workload app and the user.