Upgrading Data Lake/Data Hub database
This document describes the process to upgrade the database to the latest version supported by CDP Public Cloud services. You may use CDP UI or CDP CLI to perform this upgrade.
The databases used by the Data Lake and some of the Data Hub templates are hosted on external instances that are provisioned during the initial deployment of the respective service. For these external databases CDP Public Cloud leverages cloud-native service offerings of the three supported Cloud Service Providers (AWS RDS for PostgreSQL, Azure Database for PostgreSQL and Cloud SQL for PostgreSQL).
Databases used by other Data Hub templates are hosted on an embedded database instance, typically co-located on the Cloudera Manager host, in order to reduce the resource footprint.
Cloudera provides a database upgrade capability in CDP Public Cloud that allows moving both external and embedded databases to a higher major version.
The database upgrade is a fully automated operation. The upgrade process itself completes all of the required steps, including creating a backup, stopping and upgrading the database, restarting the database, and running post-upgrade maintenance tasks. You are not required to manually stop the Postgres instances before the upgrade.
The database upgrade is a separate operation, complementary to the existing maintenance, minor/major version and OS upgrades, as described in the CDP Public Cloud Upgrade Advisor.This is a one-time operation. Once the database of a Data Lake or Data Hub has been successfully upgraded to the newer major version, no further action is needed for the respective cluster.
Running the database upgrade operation on the Data Hub cluster will mean that all cluster services (Cloudera Manager and Rutime services) are stopped on the cluster automatically without having to stop them manually. For the Data Lake database upgrade, it is recommended that attached Data Hubs and Data services are in stopped state.
Instructions
Here are the UI and CLI instructions to perform Database Upgrade on Data Lake and Data Hub:
In CDP Management Console UI, go to Environments. Select the cluster to perform the upgrade from the list of available clusters. The clusters are eligible for this upgrade are indicated in the right most column:
Once you select the cluster, you will see a message asking to update the Postgres version. Click the Upgrade database.
Click Upgrade in the confirmation box.
- Once the Data Lake database is updated, check for the Data Hubs for that Data Lake, if there is any database upgrade notification and perform the database upgrade as described above.
Data Lake Database upgrade:
You can perform Data Lake database upgrade using cdp datalake
start-database-upgrade
CLI command.
The --target-version
parameter is optional. If you do not provide it,
the database will be upgraded to either PostgreSQL 14 (AWS and GCP) or Postgres 11
(Azure). You can also use the VERSION_11
value if you specifically want
to upgrade to PostgreSQL 11.
cdp datalake start-database-upgrade --help --form-factor public
NAME
start-database-upgrade - Upgrades the database of the Data Lake clus-
ter.
DESCRIPTION
This command initiates the upgrade of the database of the Data Lake
cluster.
SYNOPSIS
start-database-upgrade
--datalake <value>
--target-version <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--datalake (string)
The name or CRN of the Data Lake.
--target-version (string)
The database engine major version to upgrade to.
Possible values:
o VERSION_11
Data Hub Database upgrade:
You can perform Data Hub database upgrade using cdp datahub
start-database-upgrade
CLI command.
The --target-version
parameter is optional. If you do not provide it,
the database will be upgraded to either PostgreSQL 14 (AWS and GCP) or Postgres 11
(Azure). You can also use the VERSION_11
value if you specifically want
to upgrade to PostgreSQL 11.
cdp datahub start-database-upgrade --help --form-factor public
NAME
start-datahub-upgrade - Upgrades the database of the Data Hub clus-
ter.
DESCRIPTION
This command initiates the upgrade of the database of the Data Hub
cluster.
SYNOPSIS
start-database-upgrade
--datahub <value>
--target-version <value>
[--cli-input-json <value>]
[--generate-cli-skeleton]
OPTIONS
--datahub (string)
The name or CRN of the Data Hub.
--target-version (string)
The database engine major version to upgrade to.
Possible values:
o VERSION_11
The progress of the upgrade can be tracked on the respective service’s Event History page. You can verify a successful database upgrade in the Event History or in the Database tab of the cluster. Once the upgrade is complete, Cloudera recommends verifying your workloads before attempting an additional Runtime or OS upgrade.