Restoring Cloudera Data Engineering jobs from backup
You can restore Cloudera Data Engineering (CDE) jobs and associated resources from a backup ZIP file.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs.
- Download and configure the CDE CLI.
Steps
- Run the
cde backup restore
command to restore a backup file to the virtual cluster your CDE CLI is configured to interact with. Use the--duplicate-handling
flag to select the policy for handling duplicate jobs. The possible values are as follows:error
- Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
rename
- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original
- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example:
cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename
Result
Validate that the jobs and resources were restored by running
cde job list
and cde resource
list
.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs.
- Request an access token and save it as an environment variable to use in API calls. For instructions, see Getting a Cloudera Data Engineering API access token.
Steps
- Determine the API URL for the virtual cluster that you want to
restore the jobs and resources to:
- Navigate to the Cloudera Data Engineering Overview page.
- In the CDE Services column, select the service containing the virtual cluster where you want to restore the jobs.
- In the Virtual Clusters column on the right, click the Cluster Details icon for the virtual cluster you want to restore to.
- Copy the URL under JOBS API URL, and
set it as an environment variable. For example:
export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
- Restore jobs from the backup file by uploading the backup file to
the
/admin/import
endpoint. You can choose how to handle duplicate job names using theduplicatehandling=<policy>
parameter. Options are:error
- Return an error if there are duplicate job names, and abort the restore operation.
rename
- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original
- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example, to restore a backup namedcde-etl-jobs-backup.zip
using therename
duplicate handling policy:curl -k \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -X POST "${CDE_JOBS_API}/admin/import" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \ -F duplicatehandling=rename \ | jq
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs.
Steps
- Go to the Cloudera Data Engineering Overview page by clicking the Data Engineering tile in the Cloudera Data Platform (CDP) management console.
- In the CDE Services column, select the service containing the virtual cluster where you want to restore the jobs.
- In the Virtual Clusters column on the right, click the View Jobs icon for the virtual cluster you want to restore to.
- Click Jobs in the left menu.
- Click the vertical ellipses menu at the top right, and then click Restore Jobs.
- Click Choose a zip file.
- Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
- Click Select to restore the backup.
Result
The jobs and resources from the backup file are restored using the
rename
duplicate handling policy. If a job name in
the backup conflicts with an existing job, the restore operation keeps
the existing job and renames the restored job by appending a numerical
identifier to the job name.