DRS automatic backups (technical preview)
The DRS service can take automatic backups of the Control Plane and Cloudera Data Engineering (CDE) namespaces in the compute cluster of ECS. You can configure the schedule for periodic backup.
You can choose one of the following storage options for DRS automatic backups in ECS:
- External storage
- ECS uses Longhorn as the underlying storage provisioner. In Longhorn, you can store snapshots externally using an S3 compatible storage such as Ozone or NFS v4. Cloudera recommends that you use external storage for automatic backups in ECS.
- In-cluster storage
- You can use Longhorn in-cluster storage. It is recommended that you use this option only if the external storage option is not available.
You can initiate the DRS automatic backups using the updateAutoBackupPolicy CDP CLI command. Alternatively, you can edit the automatic-backup (a Kubernetes cron job) to initiate the DRS automatic backups.
Configuring external storage in ECS for DRS automatic backups
Before you initiate DRS automatic backups, you must ensure that the prerequisite activities are completed along with the required additional configuration for external storage in Longhorn.
Complete the following prerequisites:
Ensure that the following requirements are met depending on the storage
you choose for DRS automatic backups:
- An S3 compatible storage, such as Ozone, must be available in the base cluster. You must have the required access key and secret to the storage, and the provisioned bucket must have a minimum of 5 TB storage space.
- An NFS v4 storage must have a minimum of 5 TB of free space.
You must have SSH access to the base cluster node.
You must have SSH access to the ECS master node.
- Ensure that the following requirements are met depending on the storage you choose for DRS automatic backups:
Perform the following steps to change the default volume snapshot class value
from snap (this value saves snapshots in the in-cluster storage in
Longhorn) to bak (this value saves snapshots in the external
storage in Longhorn):
kubectl edit vsclass longhornkubectl command.
Change the type parameter to
bak as shown in the following sample
apiVersion: snapshot.storage.k8s.io/v1 deletionPolicy: Delete driver: driver.longhorn.io kind: VolumeSnapshotClass metadata: name: longhorn parameters: type: bak
- Run the
Complete the following steps if you are using Ozone S3 storage in
Run the scp
command to obtain the TLS certificate for Ozone.
DRS uses this certificate to communicate with the S3 gateway service using HTTPS.
Create a secret that Longhorn can use for S3 access. To accomplish this
task, you must have the S3 access key, S3 secret, S3 endpoint, and S3
certificate for Ozone storage. You must also enable a virtual host to
use the S3 compatible endpoint (Ozone).
The following sample snippet shows the kubectl command to create a secret:
kubectl create secret generic ozone-secret --from-literal=AWS_ACCESS_KEY_ID=s3g/drs1-1.drs1.root.hwx.site@ROOT.HWX.SITE --from-literal=AWS_SECRET_ACCESS_KEY=9d9e46cc77bb510821f0dbc42c584a8b7482b51dec9d3eb63c --from-literal=AWS_ENDPOINTS=https://drs1.root.hwx.site:9879/longhorn --from-literal=VIRTUAL_HOSTED_STYLE=true --from-file=AWS_CERT=cm-auto-global_cacerts.pem -n longhorn-system
For more information, see Longhorn documentation.
- Run the scp root@[***base_cluster_host***]:/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem command to obtain the TLS certificate for Ozone.
Run the kubectl edit deploy cdp-release-thunderhead-drsprovider -n
cdp-drs command, and set the TAKE_PVC_CLONE environment
value to false.
This step ensures that the backups do not create a PVC clone for external snapshot.
By default, Longhorn configuration is set to in-cluster storage and this storage requires a PVC copy to perform the DRS restore operation (DRS uses CSI snapshot technology). Therefore, to use the external storage, you must configure the volume snapshot class to bak and then configure the TAKE_PVC_CLONE environment value to false.
Configure the volume for NFS storage or bucket for Ozone S3 (on the
page) in the Longhorn UI to save the backups.
- Enter the nfs://… URL in the Backup Target field if you are using NFS storage.
Enter the required values in the following fields if you are using
Ozone S3 storage:
- s3://[***bucket***]@[***dummyregion***]/ URL in the Backup Target field. For example, s3://drs1-1@cdp/.
- [***secret that you generated in Step 3b***] in the Backup Target Credential Secret. For example, ozone-secret.
The s3://[***bucket***]@[***dummyregion***]/ URL is a virtual S3 URL that you can create using the original Ozone S3 URL, where,
- bucket is the hostname.
Longhorn prefixes the AWS_ENDPOINTS to the bucket value. For
example, the sample snippet in Step 3 shows the hostname value
drs1-1.drs1.root.hwx.site. In this instance,
drs1-1is the bucket name and the rest of the hostname
drs1.root.hwx.siteis the AWS_ENDPOINTS hostname.
- dummyregion can be any value and is not used.
If any error or message appears about the secret and the certificate having newlines or space, run the kubectl edit lhs backup-target-credential-secret -n longhorn-system command and set the value to the secret you created in Step 3b.
Initiating DRS automatic backups
After you configure the external storage in ECS, you can initiate the DRS automatic backups using the “updateAutoBackupPolicy” CDP CLI command. Alternatively, you can edit the “automatic-backup” (a Kubernetes cron job) to initiate the DRS automatic backups.
- Run the kubectl edit cj automatic-backup -n cdp-drs command.
Configure the ENABLED environment variable to
true to enable automatic backups, configure the
namespaces (if they are not configured), and then configure the backup retain
count to take backups on an hourly, daily, or weekly basis. You can also choose
a combination of two or more periods to take backups. Save the cron job.
The backup retain count determines the number of backup instances to generate.
DRS generates n+1 backups by default where n is the backup retain count. Therefore, the minimum number of backups at any point in time is 2 by default. For example, if you set the HOURLY_COUNT parameter to 2, three instances are generated; therefore, two backups are taken every hour. If you set the WEEKLY_COUNT parameter to 0, no instances are created and no backups are generated.
The following sample snippet shows the environment variables required for DRS automatic backups:
env: - name: ENABLED value: "true" - name: HOURLY_COUNT value: "1" - name: DAILY_COUNT value: "1" - name: WEEKLY_COUNT value: "1"