Upgrading Data Hubs

You can upgrade a Data Hub cluster in one of three ways: Runtime and Cloudera Manager major/minor version upgrades, service pack upgrades, and OS upgrades.

Cloudera Runtime and Cloudera Manager Major/Minor Version Upgrades

A CDP environment consists of both Data Lake and Data Hub clusters, and currently these clusters should run the same major/minor versions of Cloudera Runtime and Cloudera Manager. This major/minor version is the first three digits of the Platform Version displayed along with the Cloudera Manager information:

In the image above, the Runtime major/minor version is 7.2.7.

A Data Hub major/minor version upgrade initiates an upgrade of the major/minor Cloudera Manager and Runtime versions, as well as the required additional parcels (Spark3, Flink, Profiler, and Cloudera Flow Management). The root volumes, additional volumes, and the cluster state is retained. A major/minor version upgrade is an express upgrade where the services will not be available during the process.

A major/minor version upgrade is not an OS upgrade, as the OS/VM packages are not updated and VMs are not replaced. You can initiate an OS upgrade in a separate step. Major/minor upgrades do not update other CDP packages such as Salt.

A major/minor version upgrade does not redistribute the services according to the cluster template of the newer version. Nor does it add new services according to the new cluster template. Major/minor version upgrades do not add new service configurations in the new cluster template. These additional configurations have to be applied manually.

This is not a zero-downtime upgrade, and there will be service outages during the upgrade.

When a major/minor version upgrade is available, you will be able to select the target version for upgrade from the Upgrade tab at the bottom of the Data Hub details page:

For supported versions and templates, see Support matrix for major/minor upgrades.

Service pack upgrades

The service pack upgrade process checks to see if a new Cloudera Runtime or Cloudera Manager service pack is available, and then upgrades the Data Hub to the newest builds. Service pack upgrades do not upgrade to a new major/minor version of Runtime and CM; they only upgrade to the latest build of a service pack version.

A service pack upgrade can be conducted on a single Data Hub cluster in an environment, or multiple Data Hub clusters. You can perform a service pack upgrade independent of a Data Lake upgrade.

When a service pack upgrade is available, you will be able to select the target version (which is the same as the current version) for upgrade from the Upgrade tab at the bottom of the Data Hub details page. When you select the target version, note that the CM/CDP versions are the same, but the build numbers differ:

Service pack upgrades are available from Runtime 7.2.7 onward for RAZ and non-RAZ Data Hubs.

OS Upgrades

An OS upgrade updates the OS and VM packages to those available in the latest pre-warmed image. This is done by replacing the VM, re-attaching the attached volumes, and restarting the services. In the process, the data on root volume (for example, parcels and service logs) is lost. On larger clusters, test the OS upgrade in a development environment as the upgrade may exceed some of the cloud resource limits.

An OS upgrade does not upgrade the platform version (CM, Runtime, and additional parcels). You can perform an OS upgrade independent of a Data Hub upgrade. The OS upgrade triggers the execution of any pre-service-deployment, post-cluster-manager-start, or post-service-deployment recipes.

If an OS upgrade is available, it will appear in the Upgrade Data Hub menu with “(OS Upgrade, OS: <target-OS>)” when you select a Runtime version:

Rolling upgrades

Certain Data Hub upgrades can be performed in a rolling fashion, depending on the Data Hub template, Data Hub OS, and the Runtime version you are upgrading to and from. For more information, see Data Hub rolling upgrades.

Limitations

  • With the exception of rolling upgrades, Data Hub upgrades are not a zero-downtime upgrade and service outages will occur.
  • Ranger Authorization: Data Hub service pack upgrades with RAZ are supported only for Runtime versions 7.2.7+. Major/minor version upgrades with RAZ are supported only for Runtime versions 7.2.10-7.2.12 to versions 7.2.14+.
  • During an OS upgrade, any data on the root volume (parcels, service logs, custom software) will be lost. For older clusters (created before March 26, 2021), OS upgrade is not available when the embedded Cloudera Manager database is on the root volume.
  • Cloudera Operational Database cannot be upgraded through the Data Hub user interface and must be upgraded through the CDP beta CLI. For more information see Upgrading Cloudera Operational Database.
  • A Data Hub must be using Runtime 7.2.17 in order to be eligible for CentOS to RHEL upgrade. If you are not seeing the option to upgrade to RHEL, ensure that your cluster is running Runtime 7.2.17.

Prerequisites

  • There is required downtime of the environment during upgrades, so plan the upgrade accordingly.
  • Verify using the Support matrix for Data Hub upgrades that all Data Hub clusters in the environment are a type and version supported for major/minor upgrades.
  • Test your applications against the new platform version in a separate environment before the Runtime/CM upgrade to ensure application compatibility with the new platform version.
  • The Data Lake and all the Data Hubs in an environment must be upgraded to the same major/minor version, with the exception of Data Hubs on Runtime version 7.2.16+, which are compatible with newer versions of the Data Lake (7.2.17+, because the Data Lake must always run a higher Runtime version). If you plan to upgrade your Data Hub clusters to a later major/minor version, you must first backup and then upgrade the Data Lake.
  • Verify that any Experiences you use are running the latest version available.
  • If the upgrade involves upgrading from CentOS to RHEL, review the Prerequisites for upgrading from CentOS to RHEL.