Upgrading Data Lake/Data Hub database
This document describes the process to upgrade the database to the latest version supported by CDP Public Cloud services. You may use CDP UI or CDP CLI to perform this upgrade.
The databases used by the Data Lake and some of the Data Hub templates are hosted on external instances that are provisioned during the initial deployment of the respective service. For these external databases CDP Public Cloud leverages cloud-native service offerings of the three supported Cloud Service Providers (AWS RDS for PostgreSQL, Azure Database for PostgreSQL and Cloud SQL for PostgreSQL).
Databases used by other Data Hub templates are hosted on an embedded database instance, typically co-located on the Cloudera Manager host, in order to reduce the resource footprint.
Cloudera provides a Database Upgrade capability in CDP Public Cloud that allows moving both external and embedded databases to a higher major version.The Database Upgrade is a new operation, complementary to the existing maintenance, minor/major version and OS upgrades, as described in the CDP Public Cloud Upgrade Advisor.
This is a one-time operation. Once the database of a Data Lake or Data Hub has been successfully upgraded to the newer major version, no further action is needed for the respective cluster.
Running the Database Upgrade operation on the Data Hub cluster will mean that all cluster services (Cloudera Manager and CDH services) are stopped on the cluster automatically without having to stop them manually. For Data Lake Database upgrade, it is recommended that attached Datahubs and Data services are in stopped state.
Here are the UI and CLI instructions to perform Database Upgrade on Data Lake and Data Hub:
In CDP Management Console UI, go to Environments. Select the cluster to perform the Upgrade from the list of available clusters. The clusters are eligible for this upgrade are indicated in the right most column
Once you select the cluster, you will see a message asking to update the Postgres version. Click the Upgrade database.
Click Upgrade in the confirmation box.
- Once the Data Lake database is updated, check for the Data Hubs for that Data Lake, if there is any database upgrade notification and perform the database upgrade as described above.
Data Lake Database upgrade:
You can perform Data Lake database upgrade using
datalake start-database-upgrade CLI command.
cdp datalake start-database-upgrade --help --form-factor public NAME start-database-upgrade - Upgrades the database of the Data Lake clus- ter. DESCRIPTION This command initiates the upgrade of the database of the Data Lake cluster. SYNOPSIS start-database-upgrade --datalake <value> --target-version <value> [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --datalake (string) The name or CRN of the Data Lake. --target-version (string) The database engine major version to upgrade to. Possible values: o VERSION_11
Data Hub Database upgrade:
You can perform Data Hub database upgrade using
start-database-upgrade CLI command.
cdp datahub start-database-upgrade --help --form-factor public NAME start-datahub-upgrade - Upgrades the database of the Data Hub clus- ter. DESCRIPTION This command initiates the upgrade of the database of the Data Hub cluster. SYNOPSIS start-database-upgrade --datahub <value> --target-version <value> [--cli-input-json <value>] [--generate-cli-skeleton] OPTIONS --datahub (string) The name or CRN of the Data Hub. --target-version (string) The database engine major version to upgrade to. Possible values: o VERSION_11
The progress of the upgrade can be tracked on the respective service’s Event History page. Once the upgrade is complete, Cloudera recommends verifying your workloads before attempting an additional Runtime or OS upgrade.