Suspend and resume ML workspaces

Cloud consumption costs are a pain point for many public cloud users. The CML Suspend feature allows users to scale down the Kubernetes pods running on CML infra and CPU/GPU nodes for a given ML workspace. When the resume operation is performed on the suspended workspace, the suspended pods scale up.

A suspended CML workspace has all its autoscaling node groups, except the Platform Infra node group, shrunk to zero instances, thereby saving compute instance costs for the duration the workspace is suspended. However, Kubernetes pods running on Platform Infra nodes continue to run when a workspace is suspended.

When a workspace is suspended, you cannot access the workspace URL, and all associated models, applications, sessions, and jobs also become unavailable. The suspend operation terminates sessions and jobs, so the suspend should be started only after those operations have finished. When the workspace is resumed, models and applications automatically resume operation at the same URLs as before.

  1. To suspend a CML workspace, in the Workspaces UI, select Actions > Suspend Workspace for the workspace to suspend. Then click OK to start the suspend process.
  2. To resume a CML workspace, in the Workspaces UI, select Actions > Resume Workspace for the workspace to resume. Then click OK to start the resume process.