Upgrading to Cloudera Data Lake 7.3.1 with Cloudera Data Engineering

Cloudera Data Engineering 1.23.1 and higher versions support Cloudera Data Lake 7.3.1.

Upgrading to Cloudera Data Lake 7.3.1 is only supported for specific Cloudera Data Lake versions. For more information on the supported Cloudera Data Lake versions, see Upgrading to Cloudera Runtime 7.3.1.

If you plan to upgrade to Cloudera Data Lake 7.3.1 with Cloudera Data Engineering 1.23.1 or higher versions, first, make sure to refactor and migrate all your Spark Jobs to Spark 3.5. For more information, see Migrating Spark applications.

Steps

  1. Upgrade to, or create Cloudera Data Engineering Service of versions 1.22 or 1.23, with Data Lake version 7.2.18.

    Data Lake 7.2.18 in Cloudera Data Engineering 1.22 and 1.23 supports multiple Spark versions:

    • Spark 2.4.8
    • Spark 3.2
    • Spark 3.3
    • Spark 3.5

    For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

  2. In the Cloudera Data Engineering Service, perform the following.
    1. Create new Virtual Clusters (VCs) with Spark 3.5.
    2. Refactor your current Spark jobs to Spark 3.5.

    For information on the Spark application migration instructions, see Migrating Spark applications.

  3. Test the Spark 3.5 Jobs by running them.
  4. After ensuring that everything works as expected, delete the old VCs with lower Spark versions and keep the new VCs.
  5. Upgrade your Cloudera Data Lake version to 7.3.1 in Cloudera Management Console. For more information, see Upgrade Data Lake Cloudera Runtime version to 7.3.1. and OS to RHEL 8.10
  6. Perform in-place upgrade on your Cloudera Data Engineering Service to 1.23.1 or higher versions.
  1. Upgrade to, or create Cloudera Data Engineering Service of versions 1.23.1 or higher, with Data Lake version 7.2.18.

    Data Lake 7.2.18 in Cloudera Data Engineering 1.23.1 supports multiple Spark versions:

    • Spark 2.4.8
    • Spark 3.2
    • Spark 3.3
    • Spark 3.5

    For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

  2. In the Cloudera Data Engineering Service, perform the following.
    1. Create new Virtual Clusters (VCs) with Spark 3.5.
    2. Refactor your current Spark jobs to Spark 3.5.

    For information on the Spark application migration instructions, see Migrating Spark applications.

  3. Test the Spark 3.5 Jobs by running them.
  4. After ensuring that everything works as expected, delete the old VCs with lower Spark versions and keep the new VCs.
  5. Upgrade your Cloudera Data Lake version to 7.3.1 in Cloudera Management Console. For more information, see Upgrade Data Lake Cloudera Runtime version to 7.3.1. and OS to RHEL 8.10