Backing up Cloudera Data Engineering jobs on local storage
You can back up Cloudera Data Engineering (CDE) jobs and their associated resources and/or repositories. Backups are saved as ZIP files that can be used to restore jobs and their associated resources and/or repositories. Backups and restores is also supported for remote storage (also known as object store).
Before you begin
- Download and configure the CDE CLI.
Steps for backing up on local storage
- Run the
cde backup create
command to create a backup of the jobs in the virtual cluster your CDE CLI is configured to interact with. By default, all job configurations in the cluster are backed up, but the resources are not. You can use command flags to change this behavior as follows:--credential-filter
- Selects credentials to be backed up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator' is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
. For example:cde backup create --credential-filter "name[noteq]test" --credential-filter "created[gte]2020-01-01"
--include-credential-secrets
- Backs up credential secrets.
--include-credentials
- Backs up all virtual cluster credentials. By default, the credential secrets are not included in the backup.
--include-job-resources
- Backs up all resources associated with the selected jobs. These resources cannot
be filtered out by the
--resource-filter
parameter. --include-jobs
- Backs up all jobs in the virtual cluster. This is the default behavior.
--include-resource-credentials
- Backs up credentials for the resources. You cannot filter out selected
credentials using the
credentialfilter
parameter. This is the default behavior.
--include-resources
- Backs up all resources and repositories (internally these are also resources only) in the virtual cluster, including those not associated with jobs.
--job-filter <filter_string>
- Selects jobs to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator' is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
.For example: The following command backs up locally all jobs whose name is not test, and which is created later than or equal to 2020-01-01, and all their related resources:cde backup create --job-filter "name[noteq]test" --job-filter "created[gte]2020-01-01"
--local-path <filename>
- Specifies the local file path and name to store the backup. By default, the
backup filename is
archive-<timestamp>.zip
. --output
- The output in text or JSON format. The default is text format.
--resource-filter <filter_string>
- Selects resources to back up using the
fieldname[operator]argument
syntax. This command flag can be repeated. The name of the job and resource API field is 'fieldname' and the 'operator' is one of the following:'eq', 'noteq', 'lte', 'lt', 'gte', 'gt', 'in', 'notin', 'like', 'rlike'
. You can add multiple filters usingand
. For example:cde backup create --resource-filter "name[eq]test" --resource-filter "created[gte]2020-01-01"
For example, to backup all jobs containing the string
etl
and include all resources associated with those jobs:cde backup create --job-filter "name[like]%etl%" --include-job-resources
- Validate the backup using the
cde backup list-archive
command. For example:cde backup list-archive --local-path archive-2021-11-10T01:24:06.zip
Confirm that all jobs and resources that you expected to be backed up are included.
Result
The output of cde backup list-archive
is similar to the following:
{ "backup_set": [ { "id": "v1/backupset/v1/f332bb06-7612-4345-8f3b-da4f27f315b3/", "cdeVersion": "1.18.0-b59", "clusterID": "cluster-2bqdpfrs", "appID": "dex-app-82wlpx6d", "app_name": "yjtest", "user": "csso_yjlu", "backupStarted": "2022-10-07T06:39:46.082837691Z" } ], "entries": [ { "backup_set": "v1/backupset/v1/f332bb06-7612-4345-8f3b-da4f27f315b3/", "entityType": "Job", "name": "example-a", "adjustedName": "example-a", "archiveDirectoryPath": "v1/jobs/v1/d7826797-4985-455f-a9c8-2ab1cc624d9b/", "user": "csso_yjlu" }, { "backup_set": "v1/backupset/v1/f332bb06-7612-4345-8f3b-da4f27f315b3/", "entityType": "Resource", "name": "example-data", "adjustedName": "example-data", "archiveDirectoryPath": "v1/resources/v1/41979747-5ad1-40c3-b301-cd57111411f9/", "user": "csso_yjlu" } ] }
To restore a backup file, see Restoring Cloudera Data Engineering jobs from backup.