Sequence of backup and restore events when using DRS

Learn about the high-level steps that are performed when you create and restore a backup using Data Recovery Service (DRS).

Backup event

When you create a backup, DRS:

initiates the backup event of the Control Plane,
assigns an ID called backupCrn to the backup event,
You can specify the backupCrn in the describe-backup CDP CLI command to track the progress of the backup event and to identify whether the event completed successfully. You can also use the get-logs CDP CLI command to retrieve detailed information about the event.
archives the information to a ZIP file,
saves the ZIP file on the same cluster, and
takes persistent volume claim snapshots in the OpenShift Container Platform (OCP) cluster and persistent volume claims clones in the Embedded Container Service (ECS) cluster.

The backup event does not have any downtime impact and you can backup the Control Plane while it is running.

When you start the restore event, DRS:

initiates the restore event based on the specified backupCrn.
assigns an ID called restoreCrn to the restore event,
You an specify the restoreCrn in the describe-restore CDP CLI command to track the progress of the restore event and to identify whether the event completed successfully. You can also use the get-logs CDP CLI commnd to retrieve detailed information about the event.
deletes the existing resources and data,
During this stage of the restore event, the ECS restore vault is sealed and the POD is down which might appear as a failure in the control plane environment. After the restore event is complete, the vault and POD are auto-recovered and restored. Depending on the number of resources and data, this step might take a maximum of 10 minutes to complete.
restores the resources and data from the specified backupCrn.
The restore event has a downtime impact because the pods and data are recreated.

Consider the following points before you initiate the restore event:

When you initiate the restore-backup event, the CDP User Management System (UMS) is up and running, therefore, the restore event initiates without any issues. During the restore event, the UMS goes down and comes up eventually. However, if the UMS is corrupted, contact Cloudera Support for further assistance.
When the restore event crosses the time set in the POD_CREATION_TIMEOUT environment property of the cdp-release-thunderhead-drsprovider deployment in the drs namespace, a timeout error appears. By default, the property is set to 900 seconds. In this scenario, you must manually verify whether the pods are up or not.