Handling upgrade failures for Cloudera Data Engineering

If your upgrade of Cloudera Data Engineering (CDE) fails, you have the option to clone the service with the latest version of CDE. Learn how to handle an upgrade failure.

During a CDE upgrade, a backup is created as part of the upgrade preparation process. This procedure uses that backup to be restored in a new cluster.

The list of service backups is available in the Backup Library. To locate the Backup Library, in the left navigation menu of CDE select Administration, select Service Details, select the Maintenance tab, and select Backup Library.

To obtain the list of all available backups, in the CDP CLI, run:
cdp de list-backups

To obtain the list of service backups associated with a specific CDP environment, run "cdp de list-backups --filter "environment(eq)[***CDP ENVIRONMENT NAME***]"

The CDE backup includes the following:

  • CDE Service configurations
  • Virtual cluster names
  • Virtual cluster configurations
  • Virtual cluster file-based resources
  • Spark job definitions
  • Airflow job definitions
  • Spark Python-env resources
The following are not yet included in the backup:
  • Non file-based resources, for example, Python-venv resources and custom runtimes
  • Airflow custom operators & libraries
  • Logs
  • Job run history
  • Endpoints
  1. Ensure that the catchup option is not enabled for any user's Airflow jobs.

    Before the backup starts, if the Airflow DAG catchup options are enabled, disable them manually.

  2. By default, the restored service receives the name and ID of the original backed-up service. To ensure that the backup does not fail due to name and ID conflicts, perform either of these options:
    1. Delete the original service, which failed to upgrade during upgrading CDE.
    2. Rename the service and assign a new ID to it using the --service-id and --service-name options.
  1. Restore the service from the backup.
    cdp de restore-service --backup-id <backup-id> --environment-crn
        <environment-crn>

    Where:

    backup-id
    The ID of the backup that you are restoring from.
    environment-crn
    The Customer Resource Number (CRN) of the Cloudera Data Platform (CDP) environment with which a restored CDE service is associated. Currently, you can restore the CDE service only to the same CDP environment to which the backed-up service is associated.
    For example:
    cdp de restore-service --backup-id 2 --environment-crn crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:c67b9089-2d3b-4579-861d-c0df12a105b1
  2. Optional: To obtain a list of backups, run:
    cdp de list-backups
  3. Optional: To describe a particular backup, run:
    cdp de describe-backup --backup-id <backup-id>

    For example:

    $ cdp de describe-backup --backup-id 2 --profile priv
    {
        "backup": {
            "id": 2,
            "serviceID": "cluster-cf6h74lq",
            "serviceName": "dex-priv-default-azure-env-1689008683873",
            "environmentName": "dex-priv-default-azure-env",
            "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:c67b9089-2d3b-4579-861d-c0df12a105b1",
            "creator": "crn:altus:iam:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:user:0f9a97a7-23a7-43bd-bc71-ecdb2aa34ed5",
            "cloudPlatform": "AZURE",
            "status": "completed",
            "created": "2023-07-17T18:02:58.385455Z"
        }
    }
    
  4. Optional: In the case of an Airflow DAG failure, identify the impacted DAG on the Airflow UI and fix it.
    For more information, see the DAG-related steps in In-place upgrade with Airflow Operators and Libraries.