Sequence of backup and restore events when using DRS
Learn about the high-level steps that are performed when you create and restore a backup using Data Recovery Service (DRS).
Backup event
- initiates the backup event of the Control Plane,
- assigns an ID called
backupCrn
to the backup event,You can specify the backupCrn in the describe-backup CDP CLI command to track the progress of the backup event and to identify whether the event completed successfully. You can also use the get-logs CDP CLI command to retrieve detailed information about the event.
- archives the information to a ZIP file,
- saves the ZIP file on the same cluster, and
- takes persistent volume claim snapshots in the OpenShift Container Platform (OCP) cluster and persistent volume claims clones in the Embedded Container Service (ECS) cluster.
The backup event does not have any downtime impact and you can backup the Control Plane while it is running.
Restore event
- initiates the restore event based on the specified backupCrn.
- assigns an ID called
restoreCrn
to the restore event,You an specify the restoreCrn in the describe-restore CDP CLI command to track the progress of the restore event and to identify whether the event completed successfully. You can also use the get-logs CDP CLI commnd to retrieve detailed information about the event.
- deletes the existing resources and data,
During this stage of the restore event, the ECS restore vault is sealed and the POD is down which might appear as a failure in the control plane environment. After the restore event is complete, the vault and POD are auto-recovered and restored. Depending on the number of resources and data, this step might take a maximum of 10 minutes to complete.
- restores the resources and data from the specified backupCrn.
The restore event has a downtime impact because the pods and data are recreated.
- When you initiate the restore-backup event, the CDP User Management System (UMS) is up and running, therefore, the restore event initiates without any issues. During the restore event, the UMS goes down and comes up eventually. However, if the UMS is corrupted, contact Cloudera Support for further assistance.
- When the restore event crosses the time set in the
POD_CREATION_TIMEOUT environment property of the
cdp-release-thunderhead-drsprovider
deployment in thedrs
namespace, a timeout error appears. By default, the property is set to 900 seconds. In this scenario, you must manually verify whether the pods are up or not.