Backing up Cloudera Data Engineering jobs on remote storage
You can back up Cloudera Data Engineering (CDE) jobs and associated resources. Backups are saved as zip files that can be used to restore jobs and their associated resources. Backups and restores is also supported for remote storage (also known as object store).
Before you begin
- Download and configure the CDE CLI.
Steps for backing up on remote storage
- Run the cde backup create command to create a backup of the jobs in the virtual
cluster your CDE CLI is configured to interact with. By default, all job configurations
in the cluster are backed up, but the resources are not. You can use command flags to
change this behavior as follows:
--include-job-resources
- Backs up all resources associated with the selected jobs. These resources cannot
be filtered out by the
--resource-filter
parameter. --include-jobs
- Backs up all jobs in the virtual cluster. This is the default behavior.
--include-resources
- Backs up all resources in the virtual cluster, including those not associated with jobs.
--job-filter <filter_string>
- Selects jobs to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and 'operator is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. Multiple filters are ANDed. For example:[[[[[[[[Please provide example]]]]]]]]]
--resource-filter <filter_string>
- Selects resources to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. Filter by adding detail to the filter syntax, for example, filter syntax 'fieldname[operator]argument'. The name of the job and resource API field is 'fieldname' and 'operator is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. Multiple filters are ANDed. For example:'name[noteq]my-resource','created[gte]2020-01-01'
--remote-storage
- Backsup to remote storage. The default value is false
--remote-path
- The remote backup file relative path must be used together with param remote-storage. The path should not include a file name and should be relative to dir /dex/backup/.
--remote-name
- The remote backup file name must be used together with param remote-storage. If the file name is not specified, then a default generated value will be used.
Example for backing up to remote storage all jobs and its related resources, plus all resources whose name contains "data".
cde backup create --remote-storage --include-resources --resource-filter "name[like]%data%"
Example for creating a backup on remote storage:
./cde backup create --remote-storage --remote-path test --remote-name archive.zip
# output
{"archiveRelativePath":"test/archive.zip","archiveURL":"s3a://dex-dev-default-aws-storage/datalake/logs/dex/backup/test/archive.zip","code":201}
Result
Depending on your browser settings, you are either prompted for a location to save the
file, or the file is downloaded to your default download location. The file is a ZIP file
named archive-<timestamp>.zip
.
To restore a backup file, see Restoring Cloudera Data Engineering jobs from backup.