DRS automatic backups (technical preview)

The DRS service can take automatic backups of the Control Plane and Cloudera Data Engineering (CDE) namespaces in the compute cluster of ECS. You can configure the schedule for periodic backup.

You can choose one of the following storage options for DRS automatic backups in ECS:

External storage
ECS uses Longhorn as the underlying storage provisioner. In Longhorn, you can store snapshots externally using an S3 compatible storage such as Ozone or NFS v4. Cloudera recommends that you use external storage for automatic backups in ECS.
In-cluster storage
You can use Longhorn in-cluster storage. It is recommended that you use this option only if the external storage option is not available.

You can initiate the DRS automatic backups using the updateAutoBackupPolicy CDP CLI command. Alternatively, you can edit the automatic-backup (a Kubernetes cron job) to initiate the DRS automatic backups.

Configuring external storage in ECS for DRS automatic backups

Before you initiate DRS automatic backups, you must ensure that the prerequisite activities are completed along with the required additional configuration for external storage in Longhorn.

  1. Complete the following prerequisites:
    1. Ensure that the following requirements are met depending on the storage you choose for DRS automatic backups:
      • An S3 compatible storage, such as Ozone, must be available in the base cluster. You must have the required access key and secret to the storage, and the provisioned bucket must have a minimum of 5 TB storage space.
      • An NFS v4 storage must have a minimum of 5 TB of free space.
    2. You must have SSH access to the base cluster node.
    3. You must have SSH access to the ECS master node.
  2. Perform the following steps to change the default volume snapshot class value from snap (this value saves snapshots in the in-cluster storage in Longhorn) to bak (this value saves snapshots in the external storage in Longhorn):
    1. Run the kubectl edit vsclass longhorn kubectl command.
    2. Change the type parameter to bak as shown in the following sample snippet:
      apiVersion: snapshot.storage.k8s.io/v1
      deletionPolicy: Delete
      driver: driver.longhorn.io
      kind: VolumeSnapshotClass
      name: longhorn
      type: bak
  3. Complete the following steps if you are using Ozone S3 storage in Longhorn:
    1. Run the scp root@[***base_cluster_host***]:/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem command to obtain the TLS certificate for Ozone.
      DRS uses this certificate to communicate with the S3 gateway service using HTTPS.
    2. Create a secret that Longhorn can use for S3 access. To accomplish this task, you must have the S3 access key, S3 secret, S3 endpoint, and S3 certificate for Ozone storage. You must also enable a virtual host to use the S3 compatible endpoint (Ozone).
      The following sample snippet shows the kubectl command to create a secret:
      kubectl create secret generic ozone-secret 
      --from-literal=VIRTUAL_HOSTED_STYLE=true --from-file=AWS_CERT=cm-auto-global_cacerts.pem
      -n longhorn-system

      For more information, see Longhorn documentation.

  4. Run the kubectl edit deploy cdp-release-thunderhead-drsprovider -n cdp-drs command, and set the TAKE_PVC_CLONE environment value to false.
    This step ensures that the backups do not create a PVC clone for external snapshot.

    By default, Longhorn configuration is set to in-cluster storage and this storage requires a PVC copy to perform the DRS restore operation (DRS uses CSI snapshot technology). Therefore, to use the external storage, you must configure the volume snapshot class to bak and then configure the TAKE_PVC_CLONE environment value to false.

  5. Configure the volume for NFS storage or bucket for Ozone S3 (on the Setting > General page) in the Longhorn UI to save the backups.
    1. Enter the nfs://… URL in the Backup Target field if you are using NFS storage.
    2. Enter the required values in the following fields if you are using Ozone S3 storage:
      • s3://[***bucket***]@[***dummyregion***]/ URL in the Backup Target field. For example, s3://drs1-1@cdp/.
      • [***secret that you generated in Step 3b***] in the Backup Target Credential Secret. For example, ozone-secret.

      The s3://[***bucket***]@[***dummyregion***]/ URL is a virtual S3 URL that you can create using the original Ozone S3 URL, where,

      • bucket is the hostname. Longhorn prefixes the AWS_ENDPOINTS to the bucket value. For example, the sample snippet in Step 3 shows the hostname value as drs1-1.drs1.root.hwx.site. In this instance, drs1-1 is the bucket name and the rest of the hostname drs1.root.hwx.site is the AWS_ENDPOINTS hostname.
      • dummyregion can be any value and is not used.
    Troubleshooting: To verify whether Longhorn successfully registered the Ozone S3 credential secret, click the Backup page. No errors must appear on the page.

    If any error or message appears about the secret and the certificate having newlines or space, run the kubectl edit lhs backup-target-credential-secret -n longhorn-system command and set the value to the secret you created in Step 3b.

Initiate the DRS automatic backups using the updateAutoBackupPolicy CDP CLI command. Alternatively, you can edit the “automatic-backup” (a Kubernetes cron job) to initiate the DRS automatic backups.

Initiating DRS automatic backups

After you configure the external storage in ECS, you can initiate the DRS automatic backups using the “updateAutoBackupPolicy” CDP CLI command. Alternatively, you can edit the “automatic-backup” (a Kubernetes cron job) to initiate the DRS automatic backups.

The preferred method to initiate the DRS automatic backups is to use the updateAutoBackupPolicy CDP CLI command in the CDP client. For more information about DRS CDP CLI commands, see CLI reference for using DRS on Control Plane.
The following steps show an alternate method to initiate DRS automatic backups using kubectl commands.
  1. Run the kubectl edit cj automatic-backup -n cdp-drs command.
  2. Configure the ENABLED environment variable to true to enable automatic backups, configure the namespaces (if they are not configured), and then configure the backup retain count to take backups on an hourly, daily, or weekly basis. You can also choose a combination of two or more periods to take backups. Save the cron job.
    The backup retain count determines the number of backup instances to generate.

    DRS generates n+1 backups by default where n is the backup retain count. Therefore, the minimum number of backups at any point in time is 2 by default. For example, if you set the HOURLY_COUNT parameter to 2, three instances are generated; therefore, two backups are taken every hour. If you set the WEEKLY_COUNT parameter to 0, no instances are created and no backups are generated.

    The following sample snippet shows the environment variables required for DRS automatic backups:

     - name: ENABLED
     value: "true"
     - name: HOURLY_COUNT
     value: "1"
     - name: DAILY_COUNT
     value: "1"
     - name: WEEKLY_COUNT
     value: "1"
By default, Kubernetes initiates the first automatic backup within 30 minutes after the backup policy creation is complete.
Backup instances, depending on the chosen schedules, are generated and appear on the CDP Private Cloud Data Services Management Console > Dashboard > Backup Overview > View Details > Backup and Restore Manager > Backups tab. The instance name is auto-generated. Click the backup instance to view more details.