Restoring Cloudera Data Engineering jobs from backup

You can restore Cloudera Data Engineering (CDE) jobs and associated resources from a backup ZIP file.

Before you begin

Steps

  1. Run the cde backup restore command to restore a backup file to the virtual cluster your CDE CLI is configured to interact with. Use the --duplicate-handling flag to select the policy for handling duplicate jobs. The possible values are as follows:
    error
    Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
    rename
    If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
    keep-original
    If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.

    For example:

    cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename

Result

Validate that the jobs and resources were restored by running cde job list and cde resource list.

Before you begin

Steps

  1. Determine the API URL for the virtual cluster that you want to restore the jobs and resources to:
    1. Navigate to the Cloudera Data Engineering Overview page.
    2. In the CDE Services column, select the service containing the virtual cluster where you want to restore the jobs.
    3. In the Virtual Clusters column on the right, click the Cluster Details icon for the virtual cluster you want to restore to.
    4. Copy the URL under JOBS API URL, and set it as an environment variable. For example:
      export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
  2. Restore jobs from the backup file by uploading the backup file to the /admin/import endpoint. You can choose how to handle duplicate job names using the duplicatehandling=<policy> parameter. Options are:
    error
    Return an error if there are duplicate job names, and abort the restore operation.
    rename
    If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
    keep-original
    If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
    For example, to restore a backup named cde-etl-jobs-backup.zip using the rename duplicate handling policy:
    curl -k \
    -H "Authorization: Bearer ${CDE_TOKEN}" \
    -X POST "${CDE_JOBS_API}/admin/import" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \
    -F duplicatehandling=rename \
    | jq

Before you begin

Steps

  1. In the Cloudera Data Platform (CDP) management console, click the Data Engineering tile and click Overview.
  2. In the CDE Services column, select the service containing the virtual cluster where you want to restore the jobs.
  3. In the Virtual Clusters column on the right, click the View Jobs icon for the virtual cluster you want to restore to.
  4. Click Jobs in the left menu.
  5. Click the vertical ellipses menu at the top right, and then click Restore Jobs.
  6. Click Choose a zip file.
  7. Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
  8. Click Select to restore the backup.

Result

The jobs and resources from the backup file are restored using the rename duplicate handling policy. If a job name in the backup conflicts with an existing job, the restore operation keeps the existing job and renames the restored job by appending a numerical identifier to the job name.