Restoring Cloudera Data Engineering jobs from backup
You can restore Cloudera Data Engineering jobs and associated resources from a backup ZIP file.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up Cloudera Data Engineering jobs, see Backing up Cloudera Data Engineering jobs.
- Download and configure the CDE CLI.
Steps
- Run the
cde backup restorecommand to restore a backup file to the virtual cluster your CDE CLI is configured to interact with.- Use the
--duplicate-handlingflag to select the policy for handling duplicate jobs. The possible values are:error- Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
rename- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example:
cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename -
--remote-path- If you use a remote storage, use this flag for the relative remote storage path of the backup to be restored. This restores the archive in the remote object store. Include a filename in the path. The path is relative to /dex/backup/.
Restore example for remote storage:
./cde backup restore --remote-path test/archive.zip # the output is similar to list-archive -
The
--use-stored-userand--do-asoptional flags control which user owns the Cloudera Data Engineering jobs restored from an archive.- --use-stored-user
- This flag determines whether the system assigns job ownership to the user
who created the job or to the user performing the restoration. The default
value is
true.If set to
true, the system tries to restore the job owner to the user who created the job. If that user account is no longer valid or active, the restoration fails by default unless a fallback is provided using the--do-asflag.If set to
false, the current user performing the restoration becomes the job owner. - --do-as
- This flag explicitly specifies a user for job ownership based on the
following usage contexts:
-
- As a fallback:
- If the
--use-stored-userflag is set totrue, the user specified by the--do-asflag receives ownership of the restored jobs only if the user who created the job has an invalid account.
-
- Direct owner assignment:
- If the
--use-stored-userflag is set tofalse, the user specified by the--do-asflag is directly assigned ownership of the restored jobs.
-
- Use the
Result
Validate that the jobs and resources were restored by running
cde job list and cde resource
list.
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up Cloudera Data Engineering jobs, see Backing up Cloudera Data Engineering jobs.
- Request an access token and save it as an environment variable to use in API calls. For instructions, see Getting a Cloudera Data Engineering API access token.
Steps
- Determine the API URL for the virtual cluster that you want to restore the jobs and
resources to:
- In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
- Click Administration on the left navigation menu. The Administration page displays.
- In the Services column, select the service containing the virtual cluster where you want to restore the job. Then, in the Virtual Clusters column, click the Cluster Details icon.
- Click JOBS API URL. The Jobs API URL is copied to the clipboard.
- Paste the URL into a text editor to set the Jobs API URL as an environment
variable. For example:
export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
- Restore jobs from the backup file by uploading the backup file to the
/admin/importendpoint. You can choose how to handle duplicate job names using theduplicatehandling=<policy>parameter. The options are:error- Return an error if there are duplicate job names, and abort the restore operation.
rename- If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
keep-original- If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
For example, to restore a backup namedcde-etl-jobs-backup.zipusing therenameduplicate handling policy:curl -k \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -X POST "${CDE_JOBS_API}/admin/import" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \ -F duplicatehandling=rename \ | jq
Before you begin
- You must have a valid backup file to restore from. For instructions on backing up Cloudera Data Engineering jobs, see Backing up Cloudera Data Engineering jobs.
Steps
- In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
- Click Jobs in the left navigation menu. The Jobs page displays.
- From the drop-down list in the upper left-hand corner, select the Virtual Cluster that you want to restore jobs to.
- Click
at the top right, and then click Restore
Jobs. - Click Choose a zip file.
- Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
- Click Select to restore the backup.
Result
The jobs and resources from the backup file are restored using the
rename duplicate handling policy. If a job name in the backup conflicts
with an existing job, the restore operation keeps the existing job and renames the
restored job by appending a numerical identifier to the job name.
