Upgrading a Data Lake

If a new Cloudera Runtime/Cloudera Manager version or build is available for the Data Lake, you can initiate an upgrade from the Management Console. An OS upgrade may also be available.

In most cases it is not required that you destroy/recreate any Cloudera Data Hub clusters attached to the Data Lake cluster. For major/minor version upgrades, you must upgrade the Cloudera Data Hub clusters themselves after you upgrade the Data Lake, with the exception of Cloudera Data Hub clusters on Cloudera Runtime version 7.2.16 and later. If your Cloudera Data Hub cluster is on Cloudera Runtime version 7.2.16 or later, it is compatible with a Data Lake on a newer Cloudera Runtime version (7.2.17+). You can independently upgrade your Cloudera Data Hub clusters at a later time if you choose to, though it is not required.

Any Cloudera Data Hub clusters or data services that are not stopped during a Data Lake upgrade will error out during the upgrade process.

Required role: EnvironmentAdmin or Owner over the environment

  1. Stop all Cloudera Data Hub clusters attached to the environment.
  2. From the Management Console, click Data Lakes > <Environment Name>, scroll to the bottom of the Data Lake details page, and click the Upgrade tab.
  3. Click the Target Runtime Version drop-down menu to see any available upgrades for a given Cloudera Runtime version.
    If a new build is available for the selected version, the UI displays the current and target versions and build numbers. If only an OS upgrade is available, the UI displays “(OS upgrade only).”

    When a major/minor version upgrade is available, you'll be able to select a new Cloudera Runtime version:

    If a rolling upgrade is available, select the Perform rolling upgrade checkbox if you would like to perform this type of upgrade. The availability of a rolling upgrade depends on the current and target Cloudera Runtime versions, the Data Lake shape, and the Data Lake OS. See Data Lake rolling upgrades for more information.

  4. If you want to skip the automatic backup that is taken before the upgrade, uncheck the Automatic backup box. For more information on what is backed up during a Data Lake backup, see Data Lake backup and restore.
  5. Click Validate and Prepare to check for any configuration issues and begin the Cloudera Runtime parcel download and distribution. Using the validate and prepare option does not require downtime and makes the maintenance window for an upgrade shorter. Validate and prepare also does not make any changes to your cluster and can be run independently of the upgrade itself. Although you can begin the upgrade without first running the validate and prepare option, using it will make the process smoother and the downtime shorter.
  6. Click Upgrade to initiate the upgrade.
  7. Click the Event History tab to monitor the upgrade process and verify that it completes successfully.
    If the upgrade fails for any reason, check the Data Lake logs through Cloudera Manager for troubleshooting information and retry the upgrade. If you cannot fix the problem manually, you may be able to recover the Data Lake cluster after a failed upgrade. For more information see Recovering from failed upgrades.
For major/minor upgrades, if the upgrade is successful, you can proceed to upgrading your attached Cloudera Data Hub clusters if required. Cloudera Data Hub clusters must run the same Cloudera Runtime version as the Data Lake, with the exception of Cloudera Data Hub clusters on Cloudera Runtime version 7.2.16 and later. If your Cloudera Data Hub cluster is on Cloudera Runtime version 7.2.16 or later, it is compatible with a Data Lake on a newer Cloudera Runtime version (7.2.17+). You can independently upgrade your Cloudera Data Hub clusters at a later time if you choose to, though it is not required. For service pack and OS upgrades, you can restart your Cloudera Data Hub clusters, data services, and any stopped Virtual Warehouses.