Using the backup-restore-based upgrade script
Learn about how to use the backup-restore-based upgrade script in Cloudera Data Engineering on cloud.
This procedure applies to Cloudera Data Engineering versions 1.20.3-h2, 1.21.0-h2, 1.22.0, and higher.
If Apache Airflow connections and variables are involved, the original backup-restore-based upgrade is not applicable. This document describes a script-based fallback for performing the backup-restore-based upgrade that you can use when the default upgrade method described in Handling upgrade failures for Cloudera Data Engineering does not work.
Artifacts | Included in the automation script |
Service config | Y |
Service Logs | N |
Virtual Cluster config | Y |
Virtual Cluster end-points | N |
Virtual Cluster event logs | N |
Job: Spark | Y |
Job: Airflow | Y |
Resource: files | Y |
Resource: docker runtimes | Y |
Resource: python-venv (for Spark) | Y |
Resource: python-venv (for Airflow) | Y (Since 1.21.0) |
Git Repository | Y |
Credentials | Y |
Spark Session | N |
Spark Session logs (Including statement history) |
N |
Job Runs | Y (Since 1.22.0) |
Job Run logs (Driver, Executor, API) |
N |
Airflow DAG logs | N |
Airflow connections | Y |
Airflow variables | Y |
Once the backup is complete, Service and Virtual Cluster metadata are stored in the Object Store. Data related to jobs and resources is stored locally, at the location where you invoked the backup command.
Before running the script, you must install the listed tools:
- CDP CLI
- cdpcurl
- CDE CLI
- jq
- kubectl
- Install CDP CLI. For more information, see Installing Cloudera client.
- Install cdpcurl. For more information, see cdpcurl.
- Download the CDE CLI from your Virtual Cluster page and add its path to the
PATH environment variable with the
export PATH="$PATH:[***CLI-PATH***]"
command. - Install the other tools according to your system requirements.