Restoring Cloudera Data Engineering jobs from backup
You can restore Cloudera Data Engineering (CDE) jobs and associated resources from a backup ZIP file.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs .
- Download and configure the CDE CLI.
Steps
- Run the
cde backup restore
command to restore a backup file to the virtual cluster your CDE CLI is configured to interact with. Use the--duplicate-handling
flag to select the policy for handling duplicate jobs. The possible values are:error
- Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
rename
- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original
- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example:
cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename
--remote-path
- If you use a remote storage, use this flag for the relative remote storage path of the backup to be restored. This restores the archive in the remote object store. Include a filename in the path. The path is relative to /dex/backup/.
Restore example for remote storage:
./cde backup restore --remote-path test/archive.zip # the output is similar to list-archive
Result
Validate that the jobs and resources were restored by running
cde job list
and cde resource
list
.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs .
- Request an access token and save it as an environment variable to use in API calls. For instructions, see Getting a Cloudera Data Engineering API access token .
Steps
- Determine the API URL for the virtual cluster that you want to restore the jobs and
resources to:
- In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
- Click Administration on the left navigation menu. The Administration page displays.
- In the Services column, select the service containing the virtual cluster where you want to restore the job. Then, in the Virtual Clusters column, click the Cluster Details icon.
- Click JOBS API URL. The Jobs API URL is copied to the clipboard.
- Paste the URL into a text editor to set the Jobs API URL as an environment
variable. For example:
export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
- Restore jobs from the backup file by uploading the backup file to the
/admin/import
endpoint. You can choose how to handle duplicate job names using theduplicatehandling=<policy>
parameter. The options are:error
- Return an error if there are duplicate job names, and abort the restore operation.
rename
- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original
- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example, to restore a backup namedcde-etl-jobs-backup.zip
using therename
duplicate handling policy:curl -k \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -X POST "${CDE_JOBS_API}/admin/import" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \ -F duplicatehandling=rename \ | jq
- You must have a valid backup file to restore from. For instructions on backing up CDE jobs, see Backing up Cloudera Data Engineering jobs .
Steps
- In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
- Click Jobs in the left navigation menu. The Jobs page displays.
- From the drop-down list in the upper left-hand corner, select the Virtual Cluster that you want to restore jobs to.
- Click at the top right, and then click Restore Jobs.
- Click Choose a zip file.
- Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
- Click Select to restore the backup.
Result
The jobs and resources from the backup file are restored using the
rename
duplicate handling policy. If a job name in the backup conflicts
with an existing job, the restore operation keeps the existing job and renames the
restored job by appending a numerical identifier to the job name. If the backup contains Airflow Libraries and Operators, the
Airflow Libraries and Operators are restored as well.