Handling upgrade failures for Cloudera Data Engineering

If your upgrade of Cloudera Data Engineering fails, you have the option to clone the service with the latest version of Cloudera Data Engineering. Learn how to handle an upgrade failure.

During a Cloudera Data Engineering upgrade, a backup is created as part of the upgrade preparation process. This procedure uses that backup to be restored in a new cluster.

The list of service backups is available in the Backup Library. To locate the Backup Library, in the left navigation menu of Cloudera Data Engineering select Administration, select Service Details, select the Maintenance tab, and select Backup Library.

To obtain the list of all available backups, in the CDP CLI, run:

cdp de list-backups

To obtain the list of service backups associated with a specific Cloudera environment, run "cdp de list-backups --filter "environment(eq)[***CDP ENVIRONMENT NAME***]"

The Cloudera Data Engineering backup includes the following:

Cloudera Data Engineering Service configurations
Virtual cluster names
Virtual cluster configurations
Virtual cluster file-based resources
Spark job definitions
Airflow job definitions
Spark Python-env resources

The following are not yet included in the backup:

Non file-based resources, for example, Python-venv resources and custom runtimes
Airflow custom operators & libraries
Logs
Job run history
Endpoints

Ensure that the catchup option is not enabled for any user's Airflow jobs.
Before the backup starts, if the Airflow DAG catchup options are enabled, disable them manually.
By default, the restored Cloudera Data Engineering service is assigned the name of the original backed-up service, and a new service ID is generated. To prevent backup failure due to naming conflicts, choose one of the following options:
1. Delete the original service that failed to upgrade during the Cloudera Data Engineering upgrade.
2. Rename the service using the --service-name option.
note
To retain endpoint stability, you can explicitly assign the restored service the ID of the original backed-up service, using the --service-id option of the cdp de restore-service command.
Assigning a new ID results in new Fully Qualified Domain Names (FQDNs) for the service and all its virtual clusters, which affects the endpoint stability.
A valid service ID is:
- An 8 character-long alphanumeric string
- Does not contain vowels
- Unique

Restore the service from the backup.
```
cdp de restore-service --backup-id <backup-id> --environment-crn
    <environment-crn>
```
Where:

backup-id

The ID of the backup that you are restoring from.

environment-crn

The Customer Resource Number (CRN) of the Cloudera environment with which a restored Cloudera Data Engineering service is associated. Currently, you can restore the Cloudera Data Engineering service only to the same Cloudera environment to which the backed-up service is associated.
For example:
```
cdp de restore-service --backup-id 2 --environment-crn crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:c67b9089-2d3b-4579-861d-c0df12a105b1
```
Optional: To obtain a list of backups, run:
```
cdp de list-backups
```

Optional: To describe a particular backup, run:

cdp de describe-backup --backup-id <backup-id>

For example:

$ cdp de describe-backup --backup-id 2 --profile priv
{
    "backup": {
        "id": 2,
        "serviceID": "cluster-cf6h74lq",
        "serviceName": "dex-priv-default-azure-env-1689008683873",
        "environmentName": "dex-priv-default-azure-env",
        "environmentCrn": "crn:cdp:environments:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:environment:c67b9089-2d3b-4579-861d-c0df12a105b1",
        "creator": "crn:altus:iam:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:user:0f9a97a7-23a7-43bd-bc71-ecdb2aa34ed5",
        "cloudPlatform": "AZURE",
        "status": "completed",
        "created": "2023-07-17T18:02:58.385455Z"
    }
}

Optional: In the case of an Airflow DAG failure, identify the impacted DAG on the Airflow UI and fix it.
For more information, see the DAG-related steps in In-place upgrade with Airflow Operators and Libraries.

Handling upgrade failures for Cloudera Data Engineering

We want your opinion

How can we improve this page?