Backing up CDE jobs on remote storage

You can back up Cloudera Data Engineering (CDE) jobs and associated resources. Backups are saved as ZIP files that can be used to restore jobs and their associated resources. Backups and restores are also supported for remote storage (also known as object store).

Before you begin

  • Download and configure the CDE CLI.

Steps for backing up on remote storage

  1. Run the cde backup create command to create a backup of the jobs in the virtual cluster your CDE CLI is configured to interact with. By default, all job configurations in the cluster are backed up, but the resources are not. You can use command flags to change this behavior as follows:
    --continue-on-error
    If set to true, the archive generation continues even if there are errors for some items. It works only for remote storage.
    --credential-filter
    Selects credentials to be backed up using the fieldname[operator]argument syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following: 'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'. You can add multiple filters using and. For example:
    cde backup create --credential-filter "name[noteq]test" --credential-filter "created[gte]2020-01-01"
    --include-active-airflow-pyenv
    If set to true, it backs up the active Airflow Libraries and Operators with the associated credentials and secret. The default value is false.
    --include-credential-secrets
    Backs up credential secrets.
    --include-credentials
    Backs up all virtual cluster credentials. By default, the credential secrets are not included in the backup.
    --include-job-resources
    Backs up all resources associated with the selected jobs. These resources cannot be filtered out by the --resource-filter parameter.
    --include-jobs
    Backs up all jobs in the virtual cluster. This is the default behavior.
    --include-resource-credentials
    Backs up credentials for the resources. You cannot filter out selected credentials using the credentialfilter parameter. This is the default behavior.
    --include-resources
    Backs up all resources in the virtual cluster, including those not associated with jobs.
    --job-filter <filter_string>
    Selects jobs to back up using the fieldname[operator]argument syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following: 'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'. You can add multiple filters using and. For example:
    cde backup create --job-filter "name[noteq]test" --job-filter "created[gte]2020-01-01"
    --resource-filter <filter_string>
    Selects resources to back up using the fieldname[operator]argument syntax. This command flag can be repeated. Filter by adding detail to the filter syntax, for example, filter syntax 'fieldname[operator]argument'. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following: 'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'. You can add multiple filters using and. For example:
    'name[noteq]my-resource','created[gte]2020-01-01'
    --output
    The output in text or JSON format. The default is text format.
    --remote-storage
    Backs up to the remote storage. The default value is false.
    --remote-path
    Use the remote backup file relative path together with the remote-storage parameter. Do not include the filename in the path. The path is relative to /dex/backup/.
    --remote-name
    Use the remote backup file name together with the remote-storage parameter. If the file name is not specified, then a default generated value is used.
    --validate
    After a remote backup archive is created, it validates the archive.

Example for backing up to remote storage all jobs and its related resources, plus all resources whose name contains "data".

cde backup create --remote-storage --include-resources --resource-filter "name[like]%data%"

Example for creating a backup on remote storage:

./cde backup create --remote-storage --remote-path test --remote-name archive.zip

# output 
{"archiveRelativePath":"test/archive.zip","archiveURL":"s3a://dex-dev-default-aws-storage/datalake/logs/dex/backup/test/archive.zip","code":201} 

Result

Depending on your browser settings, you are either prompted for a location to save the file, or the file is downloaded to your default download location. The file is a ZIP file named archive-<timestamp>.zip.

To restore a backup file, see Restoring Cloudera Data Engineering jobs from backup.