Backing up CDE jobs on remote storage
You can back up Cloudera Data Engineering (CDE) jobs and associated resources. Backups are saved as ZIP files that can be used to restore jobs and their associated resources. Backups and restores are also supported for remote storage (also known as object store).
Before you begin
- Download and configure the CDE CLI.
Steps for backing up on remote storage
- Run the
cde backup create
command to create a backup of the jobs in the virtual cluster your CDE CLI is configured to interact with. By default, all job configurations in the cluster are backed up, but the resources are not. You can use command flags to change this behavior as follows:--continue-on-error
- If set to true, the archive generation continues even if there are errors for some items. It works only for remote storage.
--credential-filter
- Selects credentials to be backed up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
. For example:cde backup create --credential-filter "name[noteq]test" --credential-filter "created[gte]2020-01-01"
--include-active-airflow-pyenv
- If set to true, it backs up the active Airflow Libraries and Operators with the associated credentials and secret. The default value is false.
--include-credential-secrets
- Backs up credential secrets.
--include-credentials
- Backs up all virtual cluster credentials. By default, the credential secrets are not included in the backup.
--include-job-resources
- Backs up all resources associated with the selected jobs. These resources cannot
be filtered out by the
--resource-filter
parameter. --include-jobs
- Backs up all jobs in the virtual cluster. This is the default behavior.
--include-resource-credentials
- Backs up credentials for the resources. You cannot filter out selected
credentials using the
credentialfilter
parameter. This is the default behavior.
--include-resources
- Backs up all resources in the virtual cluster, including those not associated with jobs.
--job-filter <filter_string>
- Selects jobs to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
. For example:cde backup create --job-filter "name[noteq]test" --job-filter "created[gte]2020-01-01"
--resource-filter <filter_string>
- Selects resources to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. Filter by adding detail to the filter syntax, for example, filter syntax 'fieldname[operator]argument'. The name of the job and resource API field is 'fieldname' and the 'operator is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
. For example:'name[noteq]my-resource','created[gte]2020-01-01'
--output
- The output in text or JSON format. The default is text format.
--remote-storage
- Backs up to the remote storage. The default value is false.
--remote-path
- Use the remote backup file relative path together with the
remote-storage
parameter. Do not include the filename in the path. The path is relative to /dex/backup/.
--remote-name
- Use the remote backup file name together with the
remote-storage
parameter. If the file name is not specified, then a default generated value is used. --validate
- After a remote backup archive is created, it validates the archive.
Example for backing up to remote storage all jobs and its related resources, plus all resources whose name contains "data".
cde backup create --remote-storage --include-resources --resource-filter "name[like]%data%"
Example for creating a backup on remote storage:
./cde backup create --remote-storage --remote-path test --remote-name archive.zip
# output
{"archiveRelativePath":"test/archive.zip","archiveURL":"s3a://dex-dev-default-aws-storage/datalake/logs/dex/backup/test/archive.zip","code":201}
Result
Depending on your browser settings, you are either prompted for a location to save the
file, or the file is downloaded to your default download location. The file is a ZIP file
named archive-<timestamp>.zip
.
To restore a backup file, see Restoring Cloudera Data Engineering jobs from backup.