Known Issues and Limitations

You might run into some known issues while using Cloudera Machine Learning.

DSE-13928 Cannot restrict application access

Authorization used by Applications might not be up to date. For example, if a user is removed from a project in CDSW or CML (no more read access to the project and its applications), this user might continue to have access to the application, if they accessed the application before their access was revoked.

Workaround: When updating permissions of a project that has applications, restart applications to ensure that applications use up-to-date authorization.

DSE-13573 Scheduled Job is not running after switching over to Runtimes, Application can't be restarted

ML Runtimes is a new feature in the current release. Although you can now change your existing projects from Engine to ML Runtimes, we are currently not recommend migrating existing projects.

Applications and Jobs created with Engines might be impacted once their project is changed to use ML Runtimes based on the following:
  • You will be forced to change to ML Runtimes if you try to update related Editor/Kernel settings of Jobs, Models, Experiments, or Applications
  • Applications cannot be restarted from the UI in a migrated project unless ML Runtime settings are updated for that application.

DSE-13629 Play button missing in CML sessions with ML Runtimes

For ML Runtimes sessions, the Play button might not display.

Workaround:

You can still run the Session code by selecting Run–>Run All or Run --> Run Lines when the Play button is not shown on the UI.

DSE-13629 Play button missing in CML sessions with ML Runtimes

For ML Runtimes sessions, the Play button might not display.

Workaround:

You can still run the Session code by selecting Run–>Run All or Run --> Run Lines when the Play button is not shown on the UI.

DSE-12065: Disable file upload and download

You cannot disable file upload and download when using the Jupyter Notebook.

DSE-8834: Remove Workspace operation fails

Remove Workspace operation fails if workspace creation is still in progress.

DSE-8407: CML does not support modifying CPU/GPU scaling limits on provisioned ML workspaces

When provisioning a workspace, CML currently supports a maximum of 30 nodes of each type: CPUs and GPUs. Currently, CML does not provide a way to increase this limit for existing workspaces.

Workaround:
  1. Log in to the CDP web interface at https://console.us-west-1.cdp.cloudera.com using your corporate credentials or any other credentials that you received from your CDP administrator.
  2. Click ML Workspaces.
  3. Select the workspace whose limits you want to modify and go to its Details page.
  4. Copy the Liftie Cluster ID of the workspace. It should be of the format, liftie-abcdefgh.
  5. Login to the AWS EC2 console, and click Auto Scaling Groups.
  6. Paste the Liftie Cluster ID in the search filter box and press enter.
  7. Click on the auto-scaling group that has a name like: liftie-abcdefgh-ml-pqrstuv-xyz-cpu-workers-0-NodeGroup. Especially note the 'cpu-workers' in the middle of the string.
  8. On the Details page of this auto-scaling group, click Edit.
  9. Set Max capacity to the desired value and click Save.

Note that CML does not support lowering the maximum instances of an auto scaling group due to certain limitations in AWS.

SSO does not work if the first user to access an ML workspace is not a Site Admin

Problem: If a user assigned the MLUser role is the first user, the web application will display an error.

Workaround: Any user assigned the MLAdmin role must always be the first user to access an ML workspace.

API does not enforce a maximum number of nodes for ML workspaces

Problem: When the API is used to provision new ML workspaces, it does not enforce an upper limit on the autoscale range.

MLX-637, MLX-638: Downscaling ML workspace nodes does not work as expected

Problem: Downscaling nodes does not work as seamlessly as expected due to a lack of Bin Packing on the Spark default scheduler, and because dynamic allocation is not currently enabled. As a result, currently infrastructure pods, Spark driver/executor pods, and session pods are tagged as non-evictable using the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation.