Upgrading Data Lake/Cloudera Data Hub database

This document describes the process to upgrade the database to the latest version supported by Cloudera Public Cloud services. You may use Cloudera UI or CDP CLI to perform this upgrade.

Several Cloudera Public Cloud services, including the Data Lake cluster and the Cloudera Data Hub cluster templates and Data Services, require a relational database. Most of these databases are external and are provisioned during the initial deployment of the respective service.

The databases used by the Data Lake and some of the Cloudera Data Hub templates are hosted on external instances that are provisioned during the initial deployment of the respective service. For these external databases Cloudera Public Cloud leverages cloud-native service offerings of the three supported Cloud Service Providers (AWS RDS for PostgreSQL, Azure Database for PostgreSQL, and Cloud SQL for PostgreSQL).

Databases used by other Cloudera Data Hub templates are hosted on an embedded database instance, typically co-located on the Cloudera Manager host, in order to reduce the resource footprint.

Cloudera provides a database upgrade capability in Public Cloud that allows moving both external and embedded databases to a higher major version.

The database upgrade is a fully automated operation. The upgrade process itself completes all of the required steps, including creating a backup, stopping and upgrading the database, restarting the database, and running post-upgrade maintenance tasks. You are not required to manually stop the Postgres instances before the upgrade.

The database upgrade is a separate operation, complementary to the existing maintenance, minor/major version and OS upgrades, as described in the Cloudera Public Cloud Upgrade Advisor.

This is a one-time operation. Once the database of a Data Lake or Cloudera Data Hub has been successfully upgraded to the newer major version, no further action is needed for the respective cluster.

If a cluster uses a database that requires an upgrade, you will receive a notification, as shown below, on the Cloudera Management Console UI.

Running the database upgrade operation on the Cloudera Data Hub cluster will mean that all cluster services (Cloudera Manager and Cloudera Runtime services) are stopped on the cluster automatically without having to stop them manually. For the Data Lake database upgrade, it is recommended that attached Cloudera Data Hub clusters and Data services are in stopped state.

For AWS and GCP environments, the Database Upgrade operation will trigger a backup and a major version upgrade for the attached external database. But for Azure environments, the mechanism is different; as in the background, it will create a new database instance with a higher major version and transfer the data from the older database instance.

Instructions

Here are the UI and CLI instructions to perform Database Upgrade on Data Lake and Cloudera Data Hub:

Steps
  1. In Cloudera Management Console, go to Environments. Select the cluster to perform the upgrade from the list of available clusters. The clusters are eligible for this upgrade are indicated in the right most column:

  2. Once you select the cluster, you will see a message asking to update the Postgres version. Click the Upgrade database.

  3. Click Upgrade in the confirmation box.

  4. Once the Data Lake database is updated, check for the Cloudera Data Hub clusters for that Data Lake, if there is any database upgrade notification and perform the database upgrade as described above.

Data Lake Database upgrade:

You can perform Data Lake database upgrade using cdp datalake start-database-upgrade CLI command.

The --target-version parameter is optional. If you do not provide it, the database will be upgraded to PostgreSQL 14.

cdp datalake start-database-upgrade --help --form-factor public
NAME
       start-database-upgrade  -  Upgrades the database of the Data Lake clus-
       ter.

DESCRIPTION
       This command initiates the upgrade of the database  of  the  Data  Lake
       cluster.

SYNOPSIS
            start-database-upgrade
          --datalake <value>
          --target-version <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --datalake (string)
          The name or CRN of the Data Lake.

       --target-version (string)
          The database engine major version to upgrade to.

          Possible values:

          o VERSION_14

Cloudera Data Hub Database upgrade:

You can perform Cloudera Data Hub database upgrade using cdp datahub start-database-upgrade CLI command.

The --target-version parameter is optional. If you do not provide it, the database will be upgraded to PostgreSQL 14.

cdp datahub start-database-upgrade --help --form-factor public
NAME
       start-datahub-upgrade  -  Upgrades the database of the Data Hub clus-
       ter.

DESCRIPTION
       This command initiates the upgrade of the database  of  the  Data  Hub
       cluster.

SYNOPSIS
            start-database-upgrade
          --datahub <value>
          --target-version <value>
          [--cli-input-json <value>]
          [--generate-cli-skeleton]

OPTIONS
       --datahub (string)
          The name or CRN of the Data Hub.

       --target-version (string)
          The database engine major version to upgrade to.

          Possible values:

          o VERSION_14

The progress of the upgrade can be tracked on the respective service’s Event History page. You can verify a successful database upgrade in the Event History or in the Database tab of the cluster. Once the upgrade is complete, Cloudera recommends verifying your workloads before attempting an additional Cloudera Runtime or OS upgrade.