Backing up Cloudera AI Workbenches
Cloudera AI makes it easy to create machine learning projects, jobs, experiments, machine learning models, and applications in workbenches. The data and metadata of these artifacts are stored in different types of storage systems in Private Cloud environments or in external NFS-backed workbenches outside of a private cloud.
You can backup an Cloudera AI Workbench, and restore it later. The backup preserves all files, models, applications and other assets in the workbench (files are not backed up by Cloudera AI automatically for external NFS-based workbenches). All workbench backups can be viewed in the Workbench Backup Catalog UI.
The Backup and Restore feature gives you the ability to backup all of your data (except files in external NFS-backed workbenches) to protect your machine learning artifacts against disasters. If your Cloudera AI Workbench is backed up, this feature lets you restore the saved data so that you can recover your Cloudera AI artifacts as they were saved in the desired backup. The Backup and Restore feature gives the administrator the ability to take “on-demand” backups of Cloudera AI Workbenches. Core services running in the workbench are shut down during the backup process to ensure consistency in the backup data. It is recommended that backups are taken during off-peak hours to minimize user impacts.
The time required to complete backing up a workbench depends on the amount of data to copy. The backup process copies data from both block volumes and internal NFS. In general, the time taken to backup NFS is the dominant factor. You should regularly back up Cloudera AI Workbenches.
The time to backup NFS is highly dependent on the amount of data, and on the nature and number of files. Based on the amount of data, you can set a timeout value while taking backup. You can view the status of ongoing/old backups on Cloudera AI Workbench UI and backup catalog UI.
There is currently no restriction on the number of backups one can take, and the backup snapshots are retained indefinitely in the underlying private cloud cluster as long as the original workbench (from which this backup was taken from) is not deleted.. Cloudera AI Workbench backup details are stored in the Workbench Backup Catalog UI in the Cloudera AI control plane, and these entries may be listed, viewed, deleted or restored as desired.
Restoring a backup overwrites the existing Cloudera AI Workbench (from which this backup was taken from) wherein the restored data is automatically imported. All the projects, jobs, applications, etc., that were in existence during the backup are automatically available in the new workbench. Restoring a Cloudera AI backup overwrites the existing workbench with a new one and then launches restore jobs to create storage volumes from the backup snapshots. The restore process takes longer than a regular workbench provisioning operation due to the extra work in copying data from backup to the new storage volumes. Restores are always full-copy restores. The time to restore is dominated by NFS restoration, which takes at least as long as the time to backup the file system. The restored workbench is always created with the latest Cloudera AI software version, which may be different from the Cloudera AI version of the original workbench that was backed up.