The Data Recovery Service (DRS) is a microservice in CDP Private Cloud Data Services. It
allows you to back up and restore Kubernetes namespaces and resources on both Embedded Container
Service (ECS) and OpenShift Container Platform (OCP) for a few services such as Control Plane
and Cloudera Data Warehouse (CDW).
The following sections discuss how to back up and restore Control Plane in detail. You can
contact your Cloudera account team to determine whether your CDP service supports DRS, and if
so, which components of DRS are being supported.
Cloudera recommends that you create a backup of your Kubernetes namespace before a
maintenance activity, before you upgrade, or in general, as a best practice.
Role required:
PowerUser
By default, DRS is located in the
[***CDP_INSTALLATION_NAMESPACE***]-drs
namespace. For example, if the CDP Private Cloud Data Services installation is located in the
cdp namespace, the drs namespace is automatically named
cdp-drs. If you have multiple CDP Private Cloud Data Services installations (as
in OCP), DRS is named accordingly.
When you initiate the backup event in the Backup and Restore Manager for Control
Plane, DRS takes a backup of the following resources and data:
Kubernetes resources associated with the cdp namespace and the embedded vault
namespaces of the Control Plane in CDP Private Cloud Data Services. The resources include
deployment-related information, stateful sets, secrets, and configmaps.
Data used by the stateful pods, such as the data in the embedded database and
Kubernetes persistent volume claim.
Available methods to back up and restore environment
The following methods are available to back up and restore your environment:
DRS automatic backups
Starting from CDP Private Cloud Data Services 1.5.4, DRS automatic back ups for
Control Plane, CDW, and Cloudera Data Engineering (CDE) are enabled by default on ECS
clusters for new installations or after cluster upgrade to version 1.5.4 or higher.
You can disable this option, if required. You can also configure the external storage
in Longhorn for ECS, and then initiate the DRS automatic backup to it. For more
information, see DRS automatic backups.
Service-specific CDP CLI options
You can use the CDP CLI options to back up and restore namespaces for Control Plane
and CDW.
For the list of available CDP CLI options that you can use for backup and restore
purposes, see drscp and dw.
Backup and Restore Manager
You can back up and restore namespaces for Control Plane and CDW on the Backup and
Restore Manager page.
To access this page, click the CDP Private Cloud Data Services Management Console > Dashboard > Backup Overview > View Details option. For more information, see Access Backup and Restore
Manager.
How backup and restore events work in DRS
Backup event
The backup event does not have any downtime impact, and you can backup the Control
Plane while it is running.
When you create a backup, DRS:
initiates the backup event or job for the chosen backup entity,
For example, the Control Plane in CDP Private Cloud Data
Services.
assigns an ID called backupCrn to the backup event,
The backupCRN
appears in the CRN column on the Backup and Restore Manager > Backups tab. Click the CRN to view more details
about the backup event on the Backup [***NAME OF
BACKUP***] modal window.
creates a backup of the persistent volume claim (PVC) snapshots of the
Control Plane namespaces and the backup event's PVC.
Restore event
When you start the restore event, DRS:
initiates the restore event for the chosen backup,
assigns an ID called restoreCrn to the restore event,
The restoreCRN
appears as CRN on the Backup and Restore Manager > Restores tab. Click the CRN to view more details about the restore
event.
deletes the existing resources and data,
During this stage of the
restore event, the ECS restore vault is sealed and the POD is down which might
appear as a failure in the Control Plane environment. After the restore event is
complete, the vault and POD are auto-recovered and restored. Depending on the
number of resources and data, this step might take a maximum of 10 minutes to
complete.
restores the resources and data from the backup.
The restore event
has a downtime impact because the pods and data are recreated.