Upgrading Spark 2 to Spark 3 for Cloudera Public Cloud 7.3.1

Upgrading Apache Spark from version 2 to version 3 in Cloudera Public Cloud is a process that involves:

  1. Intermediate in-place cluster upgrade tasks, due to different support for Spark 3 versions of connectors in Cloudera versions.
  2. Intermediate Spark application migration tasks, due to minor or maintenance Spark version changes.
  3. Sidecar migration tasks for Data Hub clusters, because adjusting existing Data Hub clusters is not possible, but in-place 7.3.1 upgrade can only happen on clusters where Spark 2 is no longer present.
  4. Application migration from Spark 2 to Spark 3, due to major Spark version changes.
  5. Post-application migration tasks.
  6. In-place cluster upgrade tasks.

  7. Spark application migration tasks, due to minor or maintenance Spark version changes.

This table below provides links to the upgrade guides based on the version of Cloudera Public Cloud you're using, and the source versions of Spark in your environment.

Each upgrade guide contain all steps needed to upgrade Spark 2 to Spark 3 and perform the upgrade to Cloudera Public Cloud version 7.3.1.

Source cluster version Source cluster Spark 2 version Source cluster Spark 3 version Data Hub template
7.2.18 SP2 2.4.8 None Data Hub was created with a custom template. Upgrade guide
2.4.8 3.4.1 (bundled) Upgrade guide
None 3.4.1 (bundled) Data Hub was created with the 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie or a custom template. Upgrade guide
7.2.17 2.4.8 None Data Hub was created with the 7.2.17 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie or a custom template. Upgrade guide
Upgrade (with connectors) guide
2.4.8 3.3.2 (bundled) Data Hub was created with a custom template. Upgrade guide
Upgrade guide (with connectors)
None 3.3.2 (bundled) Data Hub was created with the 7.2.17 - Data Engineering: Apache Spark3" or a custom template. Upgrade guide