DRS automatic backups

By default, CDP Private Cloud Data Services 1.5.4 and higher versions enable Data Recovery Service (DRS) automatic backups for the Control Plane, CDE, and CDW in the compute cluster of ECS. The automatic backups are stored in the Longhorn in-cluster storage. You can also configure the external storage in ECS, and then initiate the automatic back ups to it.

The following storage options are available to store the DRS automatic backups in ECS:

In-cluster storage
By default, DRS automatic backups use Longhorn in-cluster storage. If necessary, you can configure the storage configuration settings in Longhorn by navigating to the > Cloudera Manager > Clusters [***CLUSTER NAME***] > Status > [***ECS CLUSTER NAME***] > Web UI > Storage UI page.
By default, Kubernetes initiates the first automatic backup within 30 minutes after the backup policy creation is complete, and then takes subsequent backups every hour.
You can change the backup retain count to take backups on an hourly, daily, or weekly basis and you can also disable the DRS automatic backup functionality (set ENABLED to false) using the kubectl edit cj automatic-backup -n cdp-drs command. For more information about using this command in DRS, see Initiating DRS automatic backups.
External storage
ECS uses Longhorn as the underlying storage provisioner. In Longhorn, you can store snapshots externally using an S3 compatible storage such as Ozone or NFS v4. After you configure the external storage, edit the automatic-backup cron job to initiate the automatic backups.

Configuring external storage in ECS for DRS automatic backups

Before you initiate Data Recovery Service (DRS) automatic backups to the external storage in Longhorn, you must complete the prerequisites.

  1. Complete the following prerequisites:
    1. Ensure that the following requirements are met depending on the storage you choose for DRS automatic backups:
      • An S3 compatible storage, such as Ozone, must be available in the base cluster. You must have the required access key and secret to the storage, and the provisioned bucket must have a minimum of 5 TB storage space.
      • An NFS v4 storage must have a minimum of 5 TB of free space.
    2. You must have SSH access to the base cluster node.
    3. You must have SSH access to the ECS master node.
  2. Perform the following steps to change the default volume snapshot class value from snap (this value saves snapshots in the in-cluster storage in Longhorn) to bak (this value saves snapshots in the external storage in Longhorn):
    1. Run the kubectl edit vsclass longhorn kubectl command.
    2. Change the type parameter to bak as shown in the following sample snippet:
      apiVersion: snapshot.storage.k8s.io/v1
      deletionPolicy: Delete
      driver: driver.longhorn.io
      kind: VolumeSnapshotClass
      metadata:
      name: longhorn
      parameters:
      type: bak
  3. Complete the following steps if you are using Ozone S3 storage in Longhorn:
    1. Run the scp root@[***BASE_CLUSTER_HOST***]:/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem command to obtain the TLS certificate for Ozone.
      DRS uses this certificate to communicate with the S3 gateway service using HTTPS.
    2. Create a secret that Longhorn can use for S3 access. To accomplish this task, you must have the S3 access key, S3 secret, S3 endpoint, and S3 certificate for Ozone storage. You must also enable a virtual host to use the S3 compatible endpoint (Ozone).
      The following sample snippet shows the kubectl command to create a secret:
      kubectl create secret generic ozone-secret 
      --from-literal=AWS_ACCESS_KEY_ID=s3g/drs1-1.drs1.root.hwx.site@ROOT.HWX.SITE
      --from-literal=AWS_SECRET_ACCESS_KEY=9d9e46cc77bb510821f0dbc42c584a8b7482b51dec9d3eb63c
      --from-literal=AWS_ENDPOINTS=https://drs1.root.hwx.site:9879/longhorn
      --from-literal=VIRTUAL_HOSTED_STYLE=true --from-file=AWS_CERT=cm-auto-global_cacerts.pem
      -n longhorn-system

      For more information, see Longhorn documentation.

  4. Run the kubectl edit deploy cdp-release-thunderhead-drsprovider -n cdp-drs command, and set the TAKE_PVC_CLONE environment value to false.
    This step ensures that the backups do not create a persistent volume claim (PVC) clone for external snapshot.

    By default, Longhorn configuration is set to in-cluster storage and this storage requires a PVC copy to perform the DRS restore operation (DRS uses CSI snapshot technology). Therefore, to use the external storage, you must configure the volume snapshot class to bak and then configure the TAKE_PVC_CLONE environment value to false.

  5. Configure the volume for NFS storage or bucket for Ozone S3 (on the Setting > General page) in the Longhorn UI to save the backups.
    1. Enter the nfs://… URL in the Backup Target field if you are using NFS storage.
    2. Enter the required values in the following fields if you are using Ozone S3 storage:
      • s3://[***BUCKET***]@[***DUMMY REGION***]/ URL in the Backup Target field. For example, s3://drs1-1@cdp/.
      • [***SECRET THAT YOU GENERATED IN STEP 3B***] in the Backup Target Credential Secret. For example, ozone-secret.

      The s3://[***BUCKET***]@[***DUMMY REGION***]/ URL is a virtual S3 URL that you can create using the original Ozone S3 URL, where,

      • bucket is the hostname. Longhorn prefixes the AWS_ENDPOINTS to the bucket value. For example, the sample snippet in Step 3 shows the hostname value as drs1-1.drs1.root.hwx.site. In this instance, drs1-1 is the bucket name and the rest of the hostname drs1.root.hwx.site is the AWS_ENDPOINTS hostname.
      • dummyregion can be any value and is not used.
    Troubleshooting: To verify whether Longhorn successfully registered the Ozone S3 credential secret, click the Backup page. No errors must appear on the page.

    If any error or message appears about the secret and the certificate having newlines or space, run the kubectl edit lhs backup-target-credential-secret -n longhorn-system command and set the value to the secret you created in Step 3b.

Initiate the DRS automatic backups using the updateAutoBackupPolicy CDP CLI command. Alternatively, you can edit the “automatic-backup” (a Kubernetes cron job) to initiate the DRS automatic backups.

Initiating DRS automatic backups

After you configure the external storage in ECS, you can initiate the Data Recovery Service (DRS) automatic backups using the “updateAutoBackupPolicy” CDP CLI command. Alternatively, you can edit the “automatic-backup” (a Kubernetes cron job) to initiate the DRS automatic backups.

The preferred method to initiate the DRS automatic backups is to use the updateAutoBackupPolicy CDP CLI command in the CDP client. For more information about DRS CDP CLI commands, see CLI reference for using DRS on Control Plane.
The following steps show an alternate method to initiate DRS automatic backups using kubectl commands.
  1. Run the kubectl edit cj automatic-backup -n cdp-drs command.
  2. Configure the ENABLED environment variable to true to enable automatic backups, configure the namespaces (if they are not configured), and then configure the backup retain count to take backups on an hourly, daily, or weekly basis. You can also choose a combination of two or more periods to take backups. Save the cron job.
    The backup retain count determines the number of backup instances to generate.

    DRS generates n+1 backups by default where n is the backup retain count. Therefore, the minimum number of backups at any point in time is 2 by default. For example, if you set the HOURLY_COUNT parameter to 2, three instances are generated; therefore, two backups are taken every hour. If you set the WEEKLY_COUNT parameter to 0, no instances are created and no backups are generated.

    The following sample snippet shows the environment variables required for DRS automatic backups:

    
    env:
     - name: ENABLED
     value: "true"
     - name: HOURLY_COUNT
     value: "1"
     - name: DAILY_COUNT
     value: "1"
     - name: WEEKLY_COUNT
     value: "1"
By default, Kubernetes initiates the first automatic backup within 30 minutes after the backup policy creation is complete.
Backup instances, depending on the chosen schedules, are generated and appear on the CDP Private Cloud Data Services Management Console > Dashboard > Backup Overview > View Details > Backup and Restore Manager > Backups tab. The instance name is auto-generated. Click the backup instance to view more details.