Backing up ML workspaces

Cloudera Machine Learning makes it easy to create machine learning projects, jobs, experiments, ML models, and applications in workspaces. The data and metadata of these artifacts are stored in different types of storage systems in the cloud.

You can backup an ML workspace, and restore it to a new workspace later. The backup preserves all files, models, applications and other assets in the workspace. All backed-up workspaces are stored in the Workspace Backups UI.

The Backup and Restore feature gives you the ability to backup all of your data to protect your machine learning artifacts against disasters. If your Cloudera Machine Learning workspace is backed up, this feature lets you restore the saved data into a new CML workspace so that you can recover your ML artifacts as they were saved in the desired backup. The Backup and Restore feature gives the administrator the ability to take “on-demand” backups of CML workspaces. Core services running in the workspace are shut down during the backup process to ensure consistency in the backup data. It is recommended that backups are taken during off-peak hours to minimize user impacts. Backing up a workspace can take some time to complete, depending on how much data needs to be copied by the backup job.

There is currently no restriction on the number of backups one can take, and the backup snapshots are retained indefinitely in the backup service vault of the underlying cloud platform. CML workspace backup details are stored in the Workspace Backups UI in the CML control plane, and these entries may be listed, viewed, deleted or restored as desired.

Restoring a backup creates a new CML workspace wherein the restored data is automatically imported. All the projects, jobs, applications, etc., that were in existence during the backup are automatically available in the new workspace. Since restoring a CML backup needs to provision a new cluster, and then launch restore jobs to create storage volumes from the backup snapshots, the restore process takes a bit longer than a regular workspace provisioning operation.