Migrating Virtual Clusters to Spark 3.5 after upgrading Cloudera Data Lake to 7.3.1

Learn how to migrate your Cloudera Data Engineering Virtual Clusters (VCs) to Spark 3.5 if you upgraded your Cloudera Data Lake to 7.3.1

In Cloudera Data Engineering, Cloudera Data Lake 7.3.1 only supports Apache Spark version 3.5. If you upgraded your Cloudera Data Services on premises to 1.5.5 CHF1 or a higher version and your Cloudera Data Lake version is lower than 7.3.1, then the Cloudera Data Engineering VCs are running on a Spark version lower than 3.5. After upgrading Cloudera Data Lake to 7.3.1, Cloudera Data Engineering VCs running on Spark versions lower than 3.5 might not work. For seamless experience, migrate your Cloudera Data Engineering VCs to Spark 3.5.

Upgrading to Cloudera Data Lake 7.3.1 is only supported for specific Cloudera Data Lake versions. For more information on the supported Cloudera Data Lake versions, see Upgrading Cloudera Base on premises to a higher version.

  • All your Cloudera Data Engineering jobs running on any Spark 3.x version lower than 3.5 must be compatible with Spark 3.5.
  • If you plan to upgrade to Cloudera Data Lake 7.3.1 with Cloudera Data Engineering 1.5.5 CHF1 or higher versions, first, you must refactor and migrate all your Spark Jobs to Spark 3.5. For more information, see Migrating Spark applications.
  1. Back up of all the artifacts in the Virtual Clusters (VCs). For instructions, see CDE CLI section in Backing up Cloudera Data Engineering jobs on local storage.
  2. Upgrade Cloudera Data Lake to 7.3.1. For instructions, see Upgrading Cloudera Base on premises to a higher version.
  3. Create all the corresponding Cloudera Data Engineering VCs with Spark 3.5. For instructions on creating Cloudera Data Engineering VCs, see Creating virtual clusters.
  4. Restore artifacts of those VCs by performing one of the following actions:
    • Before upgrading to Cloudera Data Lake, if your VCs were running on Spark 3.x version lower than Spark 3.5, then restore the artifacts for the corresponding VCs. For instructions on restoring artifacts, see in Restoring Cloudera Data Engineering jobs from backup.
    • Before upgrading to Cloudera Data Lake, if your VCs were running on Spark 2.x, then create the artifacts manually.