ML Runtimes Version 2022.11

Major features and updates for ML Runtimes.

This release is available with ML Runtimes version 2022.11

Version 2022.11

New features

  • PBJ (Powered by Jupyter) Runtime GA: PBJ Runtimes were previously available as a Tech Preview only. In ML Runtimes 2022.11 the PBJ Runtimes are available to all users (GA). The following PBJ Runtimes are included:
    • Python 3.7
    • Python 3.8
    • Python 3.9
    • R 3.6
    • R 4.0
    • R 4.1

    These PBJ Runtimes are available on Cloudera Public Cloud (from 2022.10-1) and Cloudera Private Cloud (from 1.5.0) form factors, but are not available on CDSW.

    The new PBJ Runtimes appear on the Runtime Selector elements when users start new workloads. Their usage is the same as the non-PBJ Runtimes, apart from Models. Models require the usage of a new library called cml for Python and R, that was introduced in Cloudera Public Cloud (2022.10-1) and the upcoming Private Cloud (1.5.0) versions.

    Usage of cml with regard to models is detailed here: PBJ Runtimes and Models

    Examples can be found here: Example models with PBJ Runtimes.

Fixed Issues

  • DSE-22433 Upgrade numpy version in Python runtimes to resolve issues.
  • DSE-22374 Reduced log level to WARN log level can be manipulated through the HADOOP_ROOT_LOGGER environment variable. This is set to WARN,console in order to keep console logging, while reducing log level from INFO to WARN.
  • DSE-22251 Add flextable dependencies to R Runtimes Added flextable dependencies (libfontconfig1-dev, libcairo2-dev) to R runtimes. Added two OS libraries so flextable can be installed and used in CML.

Known issues

  • The CUDA version (11.4.1) being used to build Nvidia GPU Runtimes is not supported by newer Torch versions (1.13+).

    Workaround: Use Torch versions up to 1.12.1.

    Cloudera Bug: DSE-24038

  • When installing sparklyr for R, the installation will succeed, but importing the library in the same session will fail. This happens, because earlier versions of dependent packages (vctrs, lifecycle, rlang, cli) are already loaded in the R session, therefore newer versions cannot be loaded.

    Workaround: Start a new session, import and use sparklyr there.

    Cloudera Bug: DSE-23762