Suspend and resume Cloudera Machine Learning Workspaces

Cloud consumption costs are a pain point for many public cloud users. The Cloudera Machine Learning Suspend feature allows users to scale down the Kubernetes pods running on Cloudera Machine Learning infra and CPU/GPU nodes for a given Cloudera Machine Learning Workspace. When the resume operation is performed on the suspended workspace, the suspended pods scale up.

A suspended Cloudera Machine Learning Workspace has all its autoscaling node groups, except the Platform Infra node group, shrunk to zero instances, thereby saving compute instance costs for the duration the workspace is suspended. However, Kubernetes pods running on Platform Infra nodes continue to run when a workspace is suspended.

When a workspace is suspended, you cannot access the workspace URL, and all associated models, applications, sessions, and jobs also become unavailable. The suspend operation terminates sessions and jobs, so the suspend should be started only after those operations have finished. When the workspace is resumed, models and applications automatically resume operation at the same URLs as before.

  1. To suspend a Cloudera Machine Learning Workspace, in the workspaces UI, select Actions > Suspend Workspace for the workspace to suspend. Then click OK to start the suspend process.
  2. To resume a Cloudera Machine Learning Workspace, in the workspaces UI, select Actions > Resume Workspace for the workspace to resume. Then click OK to start the resume process.