Before you begin
Before you begin a Data Lake upgrade, note the requirements and limitations listed below.
Requirements
-
Required role to perform Data Lake upgrade: EnvironmentAdmin or Owner over the environment
- The Data Lake must be running and in a healthy state.
- You should stop any Cloudera Data Hub clusters and any data services (such as Cloudera Data Warehouse or Cloudera Data Engineering) that are running. For the Cloudera Data Warehouse data service, you should stop any Virtual Warehouses that are running prior to beginning any upgrade or backup/restore process. Stopping data services is not required for service pack upgrades, but any Cloudera Data Hub clusters or data services that are not stopped will error out during the upgrade process.
- If you use a custom image catalog and you don’t see upgrades available, you may need to update your custom image catalog with new images.
- If the upgrade involves upgrading from CentOS to RHEL, review the Prerequisites for upgrading from CentOS to RHEL.
- Expect at least two hours of downtime while the upgrade completes. Plan the upgrade during a time of low activity.
- Optionally, you can take a backup of the Data Lake. The Data Lake upgrade process will automatically take a backup before the upgrade procedure begins, but you have the option of disabling the automatic backup if you would prefer to do this step separately. For instructions on performing a backup and restore, see Backup and restore for the Data Lake. If the upgrade fails for any reason, you can restore the Data Lake from the backup.
The upgrade requires 27 GB space on the Cloudera Manager server node and 20 GB on every other instance. If space is insufficient on your Data Lake, upgrade will not be permitted.
Limitations
Note the following limitations for the Data Lake upgrade:
-
Data Lake upgrade does not include the upgrade of the FreeIPA software or the operating system on the instance(s) running FreeIPA. To upgrade FreeIPA, see Upgrade FreeIPA.
-
Data Lake resizing (for example, moving from a light duty to a medium duty Data Lake) during an upgrade is not supported.
- Before being able to upgrade to Cloudera Runtime 7.3.1, you need to upgrade Data Lakes and all Cloudera Data Hub databases to a to PostgreSQL 14.
- When upgrading the Data Lake from Cloudera Runtime 7.2.17 to 7.2.18 or 7.3.1, if Iceberg metadata is required to be captured in Atlas, then do not use mixed versions (7.2.18 or 7.3.1 Data Lake with 7.2.17 Cloudera Data Hub), as the mismatch in the Iceberg models between 7.2.17 and 7.2.18 or 7.3.1 can cause problems.
-
If a Data Lake has attached Cloudera Data Hub clusters that are not eligible for upgrade, the Data Lake itself is not eligible for upgrade. You must delete any Cloudera Data Hub clusters that are ineligible for upgrade before proceeding with the Data Lake upgrade. See Cloudera Data Hub Upgrade for more information about which Cloudera Data Hub clusters are eligible for upgrade.
-
Service pack upgrades for RAZ-enabled Data Lakes are available only for Cloudera Runtime versions 7.2.7+.
-
Major/minor version upgrades for RAZ-enabled Data Lakes are available only for Cloudera Runtime versions 7.2.12+.
-
A Data Lake must be using Cloudera Runtime 7.2.17 to be eligible for CentOS to RHEL upgrade. If you do not see the option to upgrade from CentOS to RHEL, ensure that your Data Lake is using Cloudera Runtime 7.2.17.
-
Runtime 7.2.18 and newer do not support Medium Duty Data Lake shape and no upgrades are possible from 7.2.17 to 7.2.18 without doing a resize operation on the Data Lake prior to upgrading to 7.2.18.