Restoring Cloudera Data Engineering jobs from backup

You can restore Cloudera Data Engineering jobs and associated resources from a backup ZIP file.

Before you begin

Steps

  1. Run the cde backup restore command to restore a backup file to the virtual cluster your CDE CLI is configured to interact with.
    • Use the --duplicate-handling flag to select the policy for handling duplicate jobs. The possible values are:
      error
      Return an error if there are duplicate job names, and abort the restore operation. This is the default behavior.
      rename
      If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
      keep-original
      If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.

      For example:

      cde backup restore --local-path archive-2021-11-10T01:24:06.zip --duplicate-handling rename
    • --remote-path
      If you use a remote storage, use this flag for the relative remote storage path of the backup to be restored. This restores the archive in the remote object store. Include a filename in the path. The path is relative to /dex/backup/.

      Restore example for remote storage:

      ./cde backup restore --remote-path test/archive.zip
      
      # the output is similar to list-archive
    • The --use-stored-user and --do-as optional flags control which user owns the Cloudera Data Engineering jobs restored from an archive.

      --use-stored-user
      This flag determines whether the system assigns job ownership to the user who created the job or to the user performing the restoration. The default value is true.

      If set to true, the system tries to restore the job owner to the user who created the job. If that user account is no longer valid or active, the restoration fails by default unless a fallback is provided using the --do-as flag.

      If set to false, the current user performing the restoration becomes the job owner.

      --do-as
      This flag explicitly specifies a user for job ownership based on the following usage contexts:
      • As a fallback:
        If the --use-stored-user flag is set to true, the user specified by the --do-as flag receives ownership of the restored jobs only if the user who created the job has an invalid account.
      • Direct owner assignment:
        If the --use-stored-user flag is set to false, the user specified by the --do-as flag is directly assigned ownership of the restored jobs.

Result

Validate that the jobs and resources were restored by running cde job list and cde resource list.

Before you begin

Steps

  1. Determine the API URL for the virtual cluster that you want to restore the jobs and resources to:
    1. In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
    2. Click Administration on the left navigation menu. The Administration page displays.
    3. In the Services column, select the service containing the virtual cluster where you want to restore the job. Then, in the Virtual Clusters column, click the Cluster Details icon.
    4. Click JOBS API URL. The Jobs API URL is copied to the clipboard.
    5. Paste the URL into a text editor to set the Jobs API URL as an environment variable. For example:
      export CDE_JOBS_API="https://pmjkrgn5.cde-czlmkz4y.na-01.xvp2-7p8o.cloudera.site/dex/api/v1"
  2. Restore jobs from the backup file by uploading the backup file to the /admin/import endpoint. You can choose how to handle duplicate job names using the duplicatehandling=<policy> parameter. The options are:
    error
    Return an error if there are duplicate job names, and abort the restore operation.
    rename
    If a job name in the backup conflicts with an existing job, keep the existing job and rename the restored job by appending a numerical identifier to the job name.
    keep-original
    If a job name in the backup conflicts with an existing job, keep the existing job and do not restore the backed up job.
    For example, to restore a backup named cde-etl-jobs-backup.zip using the rename duplicate handling policy:
    curl -k \
    -H "Authorization: Bearer ${CDE_TOKEN}" \
    -X POST "${CDE_JOBS_API}/admin/import" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@/path/to/cde-etl-jobs-backup.zip;type=application/zip" \
    -F duplicatehandling=rename \
    | jq

Before you begin

Steps

  1. In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
  2. Click Jobs in the left navigation menu. The Jobs page displays.
  3. From the drop-down list in the upper left-hand corner, select the Virtual Cluster that you want to restore jobs to.
  4. Click menu at the top right, and then click Restore Jobs.
  5. Click Choose a zip file.
  6. Browse to the ZIP file containing the backup of jobs and resources you want to restore, and then click Open.
  7. Click Select to restore the backup.

Result

The jobs and resources from the backup file are restored using the rename duplicate handling policy. If a job name in the backup conflicts with an existing job, the restore operation keeps the existing job and renames the restored job by appending a numerical identifier to the job name.