ML Runtimes Known Issues and Limitations

You might run into some known issues while using ML Runtimes.

VIM text editor no longer supported for Coudera provided ML Runtimes

The VIM text editor is no longer supported for Coudera provided ML Runtimes.

Adding a new ML Runtimes when using a custom root certificate might generate error messages

When trying to add new ML Runtimes, a number of error messages might appear in various places when using a custom root certificate. For example, you might see: "Could not fetch the image metadata" or "certificate signed by unknown authority". This is caused by the runtime-puller pods not having access to the custom root certificate that is in use.

Workaround:

  1. Create a directory at any location on the master node:

    For example:

    mkdir -p /certs/

  2. Copy the full server certificate chain into this folder. It is usually easier to create a single file with all of your certificates (server, intermediate(s), root):
    # copy all certificates into a single file: 
    cat server-cert.pem intermediate.pem root.pem > /certs/cert-chain.crt
  3. (Optional) If you are using a custom docker registry that has its own certificate, you need to copy this certificate chain into this same file:
    cat docker-registry-cert.pem >> /certs/cert-chain.crt
  4. Copy the global CA certificates into this new file:
    # cat /etc/ssl/certs/ca-bundle.crt >> /certs/cert-chain.crt
  5. Edit your deployment of runtime manager and add the new mount.

    Do not delete any existing objects.

    #kubectl edit deployment runtime-manager

  6. Under VolumeMounts, add the following lines. Note that the text is white-space sensitive - use spaces and not tabs.
    - mountPath: /etc/ssl/certs/ca-certificates.crt 
       name: mycert 
       subPath: cert-chain.crt #this should match the new file name created in step 4
                    

    Under Volumes add the following text in the same edit:

    - hostPath: 
       path: /certs/  #this needs to match the folder created in step 1
       type: "" 
    name: mycert
  7. Save your changes:

    wq!

    Once saved, you will receive the message "deployment.apps/runtime-manager edited" and the pod will be restarted with your new changes.

  8. To persist these changes across cluster restarts, use the following Knowledge Base article to create a kubernetes patch file for the runtime-manager deployment: https://community.cloudera.com/t5/Customer/Patching-CDSW-Kubernetes-deployments/ta-p/90241

Cloudera Bug: DSE-20530

DSE-24038 Current Cuda images (e.g.: dock-cuda_version=11.4.1) images do not contain nvvm libraries

The CUDA version (11.4.1) being used to build Nvidia GPU runtimes is not supported by newer Torch versions (1.13+).

Workaround: Use Torch versions up to 1.12.1.

DSE-17126 Starting a worker from Runtime session will create a worker with engine image

This issue is resolved in ML Runtimes 2021.09 when used with the latest ML Workspace versions.

For workers to function properly with ML Runtimes, please use ML Runtimes 2021.09 or later with CML Workspace version of 2.0.22 or later.

Spark Runtime Add-on required for Spark 2 integration with Scala Runtimes

Scala Runtimes on CML require Spark Runtime Addon to enable Spark2 integration. Spark3 is not supported with the Scala Runtime.

DSE-17981 - Disable Scale runtimes in models, experiments and applications runtime selection

Scala Runtimes should not appear as an option for Models, Experiments, and Applications in the user interface. Currently Scala Runtimes only support Session and Jobs.

DSE-17228 Workbench completion broken in R Runtime session

Code completion does not work in the R Runtimes Workbench versions of the ML Runtimes current release.

Workaround: Downgrade to ML Runtimes 2021.02.

This issue is resolved in ML Runtimes 2021.09.

DSE 14447 Some bugs present for R in the legacy engine persist in ML Runtimes

Some bugs that were present for R in the legacy engine persist in ML runtimes.