Upgrading Cloudera Data Hub clusters

You can upgrade a Cloudera Data Hub cluster in one of three ways: Cloudera Runtime and Cloudera Manager major/minor version upgrades, service pack upgrades, and OS upgrades.

Cloudera Runtime and Cloudera Manager Major/Minor Version Upgrades

A Cloudera environment consists of both Data Lake and Cloudera Data Hub clusters, and currently these clusters should run the same major/minor versions of Cloudera Runtime and Cloudera Manager. This major/minor version is the first three digits of the Platform Version displayed along with the Cloudera Manager information:

In the image above, the Cloudera Runtime major/minor version is 7.2.7.

A Cloudera Data Hub major/minor version upgrade initiates an upgrade of the major/minor Cloudera Manager and Cloudera Runtime versions, as well as the required additional parcels (Spark3, Flink, Profiler, and Cloudera Flow Management). The root volumes, additional volumes, and the cluster state is retained. A major/minor version upgrade is an express upgrade where the services will not be available during the process.

A major/minor version upgrade is not an OS upgrade, as the OS/VM packages are not updated and VMs are not replaced. You can initiate an OS upgrade in a separate step. Major/minor upgrades do not update other Cloudera packages such as Salt.

A major/minor version upgrade does not redistribute the services according to the cluster template of the newer version. Nor does it add new services according to the new cluster template. Major/minor version upgrades do not add new service configurations in the new cluster template. These additional configurations have to be applied manually.

This is not a zero-downtime upgrade, and there will be service outages during the upgrade.

When a major/minor version upgrade is available, you will be able to select the target version for upgrade from the Upgrade tab at the bottom of the Data Hub details page:

For supported versions and templates, see Support matrix for major/minor upgrades.

Service pack upgrades

The service pack upgrade process checks to see if a new Cloudera Runtime or Cloudera Manager service pack is available, and then upgrades the Cloudera Data Hub cluster to the newest builds. Service pack upgrades do not upgrade to a new major/minor version of Cloudera Runtime and Cloudera Manager; they only upgrade to the latest build of a service pack version.

A service pack upgrade can be conducted on a single Cloudera Data Hub cluster in an environment, or multiple Cloudera Data Hub clusters. You can perform a service pack upgrade independent of a Data Lake upgrade.

When a service pack upgrade is available, you will be able to select the target version (which is the same as the current version) for upgrade from the Upgrade tab at the bottom of the Data Hub details page. When you select the target version, note that the Cloudera Manager/Cloudera Runtime versions are the same, but the build numbers differ:

Service pack upgrades are available from Cloudera Runtime 7.2.7 onward for RAZ and non-RAZ Cloudera Data Hub clusters.

OS Upgrades

An OS upgrade updates the OS and VM packages to those available in the latest pre-warmed image. This is done by replacing the VM, re-attaching the attached volumes, and restarting the services. In the process, the data on root volume (for example, parcels and service logs) is lost. On larger clusters, test the OS upgrade in a development environment as the upgrade may exceed some of the cloud resource limits.

An OS upgrade does not upgrade the platform version (Cloudera Manager, Cloudera Runtime, and additional parcels). You can perform an OS upgrade independent of a Cloudera Data Hub cluster upgrade. The OS upgrade triggers the execution of any pre-service-deployment, post-cluster-manager-start, or post-service-deployment recipes.

If an OS upgrade is available, it will appear in the Upgrade Data Hub menu with “(OS Upgrade, OS: <target-OS>)” when you select a Cloudera Runtime version:

Rolling upgrades

Certain Cloudera Data Hub upgrades can be performed in a rolling fashion, depending on the Cloudera Data Hub template, Cloudera Data Hub OS, and the Cloudera Runtime version you are upgrading to and from. For more information, see Cloudera Data Hub rolling upgrades.

Limitations

  • With the exception of rolling upgrades, Cloudera Data Hub upgrades are not a zero-downtime upgrade and service outages will occur.
  • Ranger Authorization: Cloudera Data Hub service pack upgrades with RAZ are supported only for Cloudera Runtime versions 7.2.7+. Major/minor version upgrades with RAZ are supported only for Cloudera Runtime versions 7.2.10-7.2.12 to versions 7.2.14+.
  • During an OS upgrade, any data on the root volume (parcels, service logs, custom software) will be lost. For older clusters (created before March 26, 2021), OS upgrade is not available when the embedded Cloudera Manager database is on the root volume.
  • Cloudera Operational Database cannot be upgraded through the Cloudera Data Hub user interface and must be upgraded through the Cloudera beta CLI. For more information see Upgrading Cloudera Operational Database.
  • When upgrading the Data Lake from Cloudera Runtime 7.2.17 to 7.2.18 or 7.3.1, if Iceberg metadata is required to be captured in Atlas, then do not use mixed versions (7.2.18 or 7.3.1 Data Lake with 7.2.17 Cloudera Data Hub), as the mismatch in the Iceberg models between 7.2.17 and 7.2.18 or 7.3.1 can cause problems.
  • A Cloudera Data Hub cluster must be using Cloudera Runtime 7.2.17 in order to be eligible for CentOS to RHEL upgrade. If you are not seeing the option to upgrade to RHEL, ensure that your cluster is running Cloudera Runtime 7.2.17.

Prerequisites

  • There is required downtime of the environment during upgrades, so plan the upgrade accordingly.
  • Verify using the Support matrix for Cloudera Data Hub upgrades that all Cloudera Data Hub clusters in the environment are a type and version supported for major/minor upgrades.
  • Test your applications against the new platform version in a separate environment before the Cloudera Runtime/Cloudera Manager upgrade to ensure application compatibility with the new platform version.
  • The Data Lake and all the Cloudera Data Hub clusters in an environment must be upgraded to the same major/minor version, with the exception of Cloudera Data Hub clusters on Cloudera Runtime version 7.2.16+, which are compatible with newer versions of the Data Lake (7.2.17+, because the Data Lake must always run a higher Cloudera Runtime version). If you plan to upgrade your Cloudera Data Hub clusters to a later major/minor version, you must first backup and then upgrade the Data Lake.
  • Verify that any Experiences you use are running the latest version available.
  • Before being able to upgrade to Cloudera Runtime 7.3.1, you need to upgrade Data Lakes and all Cloudera Data Hub databases to a to PostgreSQL 14.
  • If the upgrade involves upgrading from CentOS to RHEL, review the Prerequisites for upgrading from CentOS to RHEL.