Upgrading Apache Spark 2 to Spark 3

Upgrading Apache Spark from version 2 to version 3 in Cloudera Cloudera on cloud is a process that involves:

  1. Intermediate in-place cluster upgrade tasks, due to different support for Spark 3 versions of connectors in Cloudera versions.
  2. Intermediate Spark application migration tasks, due to minor or maintenance Spark version changes.
  3. Sidecar migration tasks for Data Hub clusters, because adjusting existing Data Hub clusters is not possible, but in-place 7.3.1 upgrade can only happen on clusters where Spark 2 is no longer present.
  4. Application migration from Spark 2 to Spark 3, due to major Spark version changes.
  5. Post-application migration tasks.
  6. In-place cluster upgrade tasks.

  7. Spark application migration tasks, due to minor or maintenance Spark version changes.

This table below provides links to the upgrade guides based on the version of Cloudera on cloud you're using, and the source versions of Spark in your environment.

Each upgrade guide contain all steps needed to upgrade Spark 2 to Spark 3 and perform the upgrade to Cloudera on cloud version 7.3.1.

Source cluster version Source cluster Spark 2 version Source cluster Spark 3 version Data Hub template
7.2.18 SP2 2.4.8 None Data Hub was created with a custom template. Upgrade guide
2.4.8 3.4.1 (bundled) Upgrade guide
None 3.4.1 (bundled) Data Hub was created with the 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie or a custom template. Upgrade guide
7.2.17 2.4.8 None Data Hub was created with the 7.2.17 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie or a custom template. Upgrade guide
Upgrade (with connectors) guide
2.4.8 3.3.2 (bundled) Data Hub was created with a custom template. Upgrade guide
Upgrade guide (with connectors)
None 3.3.2 (bundled) Data Hub was created with the 7.2.17 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie or a custom template. Upgrade guide

Spark 3 minor upgrade to Apache Spark 3.5.4

Apache Spark 3.5.4 is included in Cloudera Runtime 7.3.1.100 and higher, and 7.3.2.0. Use these guides when you already run Spark 3 and need to move to Spark 3.5.4.

Source Spark 3 version Target Cloudera Runtime (Spark 3.5.4) Documentation
3.4.1 (bundled) 7.3.1.100 and higher, or 7.3.2.0 Upgrade guide
Source cluster version Source cluster Spark 3 version Data Hub template Target Cloudera Runtime (Spark 3.5.4)
7.2.17 3.3.2 (bundled) Data Hub was created with a custom template or the 7.2.17 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template. 7.3.1.100 and higher, or 7.3.2.0 Upgrade guide