Data Recovery Service overview

The Data Recovery Service (DRS) is a microservice in CDP Private Cloud Data Services. It allows you to back up and restore Kubernetes namespaces and resources on both Embedded Container Service (ECS) and OpenShift Container Platform (OCP) for a few services such as Control Plane and Cloudera Data Warehouse (CDW).

The following sections discuss how to back up and restore Control Plane in detail. You can contact your Cloudera account team to determine whether your CDP service supports DRS, and if so, which components of DRS are being supported.

Cloudera recommends that you create a backup of your Kubernetes namespace before a maintenance activity, before you upgrade, or in general, as a best practice.

Role required: PowerUser

By default, DRS is located in the [***CDP_INSTALLATION_NAMESPACE***]-drs namespace. For example, if the CDP Private Cloud Data Services installation is located in the cdp namespace, the drs namespace is automatically named cdp-drs. If you have multiple CDP Private Cloud Data Services installations (as in OCP), DRS is named accordingly.

When you initiate the backup event in the Backup and Restore Manager for Control Plane, DRS takes a backup of the following resources and data:
  • Kubernetes resources associated with the cdp namespace and the embedded vault namespaces of the Control Plane in CDP Private Cloud Data Services. The resources include deployment-related information, stateful sets, secrets, and configmaps.
  • Data used by the stateful pods, such as the data in the embedded database and Kubernetes persistent volume claim.

Available methods to back up and restore environment

The following methods are available to back up and restore your environment:

DRS automatic backups
Starting from CDP Private Cloud Data Services 1.5.4, DRS automatic back ups for Control Plane, CDW, and Cloudera Data Engineering (CDE) are enabled by default on ECS clusters for new installations or after cluster upgrade to version 1.5.4 or higher.
You can disable this option, if required. You can also configure the external storage in Longhorn for ECS, and then initiate the DRS automatic backup to it. For more information, see DRS automatic backups.
Service-specific CDP CLI options
You can use the CDP CLI options to back up and restore namespaces for Control Plane and CDW.
For the list of available CDP CLI options that you can use for backup and restore purposes, see drscp and dw.
Backup and Restore Manager
You can back up and restore namespaces for Control Plane and CDW on the Backup and Restore Manager page.
To access this page, click the CDP Private Cloud Data Services Management Console > Dashboard > Backup Overview > View Details option. For more information, see Access Backup and Restore Manager.

How backup and restore events work in DRS

Backup event
The backup event does not have any downtime impact, and you can backup the Control Plane while it is running.
When you create a backup, DRS:
  1. initiates the backup event or job for the chosen backup entity,

    For example, the Control Plane in CDP Private Cloud Data Services.

  2. assigns an ID called backupCrn to the backup event,

    The backupCRN appears in the CRN column on the Backup and Restore Manager > Backups tab. Click the CRN to view more details about the backup event on the Backup [***NAME OF BACKUP***] modal window.

  3. creates a backup of the persistent volume claim (PVC) snapshots of the Control Plane namespaces and the backup event's PVC.
Restore event
When you start the restore event, DRS:
  1. initiates the restore event for the chosen backup,
  2. assigns an ID called restoreCrn to the restore event,

    The restoreCRN appears as CRN on the Backup and Restore Manager > Restores tab. Click the CRN to view more details about the restore event.

  3. deletes the existing resources and data,

    During this stage of the restore event, the ECS restore vault is sealed and the POD is down which might appear as a failure in the Control Plane environment. After the restore event is complete, the vault and POD are auto-recovered and restored. Depending on the number of resources and data, this step might take a maximum of 10 minutes to complete.

  4. restores the resources and data from the backup.

    The restore event has a downtime impact because the pods and data are recreated.