Restoring Cloudera Data Engineering jobs from backup

You can restore Cloudera Data Engineering (CDE) jobs and associated resources from a backup ZIP file.

Before you begin

Steps

  1. Run the cde backup restore command to restore a backup file to the virtual cluster your CDE CLI is configured to interact with. Use the --duplicate-handling flag to select the policy for handling duplicate jobs. The possible values are:
    error
    Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
    rename
    If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
    keep-original
    If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.

    For example:

    cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename
    --remote-path
    If you use a remote storage, use this flag for the relative remote storage path of the backup to be restored. This restores the archive in the remote object store. Include a filename in the path. The path is relative to /dex/backup/.

    Restore example for remote storage:

    ./cde backup restore --remote-path test/archive.zip
    
    # the output is similar to list-archive

Result

Validate that the jobs and resources were restored by running cde job list and cde resource list.

Before you begin

Steps

  1. Determine the API URL for the virtual cluster that you want to restore the jobs and resources to:
    1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
    2. Click Administration on the left navigation menu. The Administration page displays.
    3. In the Services column, select the service containing the virtual cluster where you want to restore the job. Then, in the Virtual Clusters column, click the Cluster Details icon.
    4. Click JOBS API URL. The Jobs API URL is copied to the clipboard.
    5. Paste the URL into a text editor to set the Jobs API URL as an environment variable. For example:
      export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
  2. Restore jobs from the backup file by uploading the backup file to the /admin/import endpoint. You can choose how to handle duplicate job names using the duplicatehandling=<policy> parameter. The options are:
    error
    Return an error if there are duplicate job names, and abort the restore operation.
    rename
    If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
    keep-original
    If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
    For example, to restore a backup named cde-etl-jobs-backup.zip using the rename duplicate handling policy:
    curl -k \
    -H "Authorization: Bearer ${CDE_TOKEN}" \
    -X POST "${CDE_JOBS_API}/admin/import" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \
    -F duplicatehandling=rename \
    | jq
Before you begin

Steps

  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
  2. Click Jobs in the left navigation menu. The Jobs page displays.
  3. From the drop-down list in the upper left-hand corner, select the Virtual Cluster that you want to restore jobs to.
  4. Click menu at the top right, and then click Restore Jobs.
  5. Click Choose a zip file.
  6. Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
  7. Click Select to restore the backup.

Result

The jobs and resources from the backup file are restored using the rename duplicate handling policy. If a job name in the backup conflicts with an existing job, the restore operation keeps the existing job and renames the restored job by appending a numerical identifier to the job name. If the backup contains Airflow Libraries and Operators, the Airflow Libraries and Operators are restored as well.