Upgrading Cloudera Data EngineeringPDF version

Using the backup-restore-based upgrade script

Learn about how to use the backup-restore-based upgrade script in Cloudera Data Engineering on cloud.

This procedure applies to Cloudera Data Engineering versions 1.20.3-h2, 1.21.0-h2, 1.22.0, and higher.

If Apache Airflow connections and variables are involved, the original backup-restore-based upgrade is not applicable. This document describes a script-based fallback for performing the backup-restore-based upgrade that you can use when the default upgrade method described in Handling upgrade failures for Cloudera Data Engineering does not work.

Table 1. Contents included in the backup
Artifacts Included in the automation script
Service config Y
Service Logs N
Virtual Cluster config Y
Virtual Cluster end-points N
Virtual Cluster event logs N
Job: Spark Y
Job: Airflow Y
Resource: files Y
Resource: docker runtimes Y
Resource: python-venv (for Spark) Y
Resource: python-venv (for Airflow) Y (Since 1.21.0)
Git Repository Y
Credentials Y
Spark Session N

Spark Session logs

(Including statement history)

N
Job Runs Y (Since 1.22.0)

Job Run logs

(Driver, Executor, API)

N
Airflow DAG logs N
Airflow connections Y
Airflow variables Y

Before running the script, you must install the listed tools:

  • CDP CLI
  • cdpcurl
  • CDE CLI
  • jq
  • kubectl
  1. Install CDP CLI. For more information, see Installing Cloudera client.
  2. Install cdpcurl. For more information, see cdpcurl.
  3. Download the CDE CLI from your Virtual Cluster page and add its path to the PATH environment variable with the export PATH="$PATH:[***CLI-PATH***]" command.
  4. Install the other tools according to your system requirements.
  1. Configure the CDP CLI.
    1. On the Cloudera Management Console, navigate to your Profile page.
    2. On the Access Keys tab, click Generate Access Key.
    3. Download the credentials file and copy the contents to ~/.cdp/credentials
      The credentials are set as the default profile. You can also rename the profile. If you do so, set the CDP profile to your preference using one of the listed methods:
      • The CDP_PROFILE environment variable
      • The cde-service-backup-restore-utils.properties file
      • The --cdp-profile command line option
  2. Configure the CDE CLI.
    Configure the CDE CLI following the instructions in Configuring the CLI client. Cloudera recommends that you use the Cloudera credentials, as they are needed for CDP CLI too.
    1. Set the CDE CLI using one of the listed methods:
      • The environment variable
      • The cde-service-backup-restore-utils.properties file
      • The ~/.cde/config.yaml file
    2. Run the following command once to make sure that the configuration works:
      cde job list --vcluster-endpoint [***JOBS-API-URL***]
  3. Back up the Cluster.
    1. Download the backup-restore-based upgrade script (cde-service-backup-restore-utils.sh) to your local machine: navigate to https://github.com/cloudera/cde-public/blob/master/scripts/cde-service-backup-restore-utils.sh and download the raw file using the download icon on the right.
    2. Authenticate to kubectl to access your Cluster.
    3. To back up the Cluster, run the following command.
      ./cde-service-backup-restore-utils.sh backup-service -s <cluster-id> -b <backup-base-directory> --pre-check
    Some of the contents are stored remotely and some are stored locally in the directory you specify.
  4. Restore the Cluster.
    1. Run the following command to provision the Cluster and Virtual Clusters. This can take around an hour.
      ./cde-service-backup-restore-utils.sh restore-service -s <original-cluster-id> -t <new-cluster-id> -n <new-cluster-name> -b <backup-base-directory> --service-only
    2. Authenticate again to kubectl to access your Cluster.
      If the version of the new service is different from the original one, download the CDE CLI tool of the new service and add its path to the PATH environment variable.

      To restore your contents, run:

      ./cde-service-backup-restore-utils.sh restore-service -s <original-cluster-id> -t <new-cluster-id> -n <new-cluster-name> -b <backup-base-directory> --contents-only