Backing up Cloudera Data Engineering jobs on remote storage

You can back up Cloudera Data Engineering (CDE) jobs and associated resources. Backups are saved as zip files that can be used to restore jobs and their associated resources. Backups and restores is also supported for remote storage (also known as object store).

Before you begin

  • Download and configure the CDE CLI.

Steps for backing up on remote storage

  1. Run the cde backup create command to create a backup of the jobs in the virtual cluster your CDE CLI is configured to interact with. By default, all job configurations in the cluster are backed up, but the resources are not. You can use command flags to change this behavior as follows:
    --include-job-resources
    Backs up all resources associated with the selected jobs. These resources cannot be filtered out by the --resource-filter parameter.
    --include-jobs
    Backs up all jobs in the virtual cluster. This is the default behavior.
    --include-resources
    Backs up all resources in the virtual cluster, including those not associated with jobs.
    --job-filter <filter_string>
    Selects jobs to back up using the fieldname[operator]argument syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and 'operator is one of the following: 'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'. Multiple filters are ANDed. For example:
    cde backup create --job-filter "name[noteq]test" --job-filter "created[gte]2020-01-01"
    --resource-filter <filter_string>
    Selects resources to back up using the fieldname[operator]argument syntax. This command flag can be repeated. Filter by adding detail to the filter syntax, for example, filter syntax 'fieldname[operator]argument'. The name of the job and resource API field is 'fieldname' and 'operator is one of the following: 'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'. Multiple filters are ANDed. For example:
    'name[noteq]my-resource','created[gte]2020-01-01'
    --remote-storage
    Backsup to remote storage. The default value is false
    --remote-path
    The remote backup file relative path must be used together with param remote-storage. The path should not include a file name and should be relative to dir /dex/backup/.
    --remote-name
    The remote backup file name must be used together with param remote-storage. If the file name is not specified, then a default generated value will be used.

Example for backing up to remote storage all jobs and its related resources, plus all resources whose name contains "data".

cde backup create --remote-storage --include-resources --resource-filter "name[like]%data%"

Example for creating a backup on remote storage:

./cde backup create --remote-storage --remote-path test --remote-name archive.zip

# output 
{"archiveRelativePath":"test/archive.zip","archiveURL":"s3a://dex-dev-default-aws-storage/datalake/logs/dex/backup/test/archive.zip","code":201} 

Result

Depending on your browser settings, you are either prompted for a location to save the file, or the file is downloaded to your default download location. The file is a ZIP file named archive-<timestamp>.zip.

To restore a backup file, see Restoring Cloudera Data Engineering jobs from backup.