Upgrading Apache Spark 2.4.8 on 7.2.18 SP2 to Spark 3 on 7.3.1

The following steps will help you upgrading from Apache Spark 2.4.8 on Cloudera on cloud 7.2.18 SP2 to Spark 3.4.1 on 7.3.1.


Source cluster version	Source cluster Spark 2 version	Source cluster Spark 3 version	Target cluster version	Target cluster Spark 3 version	Spark 2 used with connectors¹
7.2.18 SP2	2.4.8	none	7.3.1	3.4.1	no

Sidecar migration of Data Hub clusters🔗

Sidecar migration tasks for Data Hub clusters

The new 7.2.18 Data Hub cluster needs to use Spark 3 and Livy 3 instead of Spark 2 and Livy 2.

Depending on the template you used for your existing Data Hub clusters, a new custom template might be needed that contains Spark 3 instead of Spark 2. Alternatively, the built-in 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template can be used, as it contains Spark 3 only.

Check the current services in your template, and add the built-in 7.2.18 - : Apache Spark3, Apache Hive, Apache Oozie template.
If the built-in 7.2.18 - : Apache Spark3, Apache Hive, Apache Oozie template doesn't work, you can create a custom template. Replace all Spark 2 and Livy 2 references with Spark 3 and Livy 3, respectively.
Add a new Spark 3-based 7.2.18 cluster to the environment, using your custom template or the built-in 7.2.18 - : Apache Spark3, Apache Hive, Apache Oozie template.
Migrate all non-spark workloads from the old cluster to the new cluster.

Application migration tasks (Spark 2 to 3)🔗

Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.4.1
1. Check the supported Java versions.
2. Check the supported Scala version.
3. Check the supported Python versions.
4. Account for changed or versioned Spark commands in your code. (spark-submit, pyspark, etc.)
5. Check supported versions for Spark connectors.
6. Check the logging library used in your code.
7. Check the compatibility of 3rd-party libraries used in your code.
8. Check Spark behavior changes and refactor your code.
Migrate all Spark 2 applications in the old cluster to Spark 3 applications in the new cluster.

Post-application migration tasks🔗

Move Spark 2 event logs to the Spark 3 event logs directory.
Drop the old cluster.

In-place cluster upgrade🔗

Upgrade the Data Lake cluster to 7.3.1
1. Check the support matrix for upgrades.
2. Stop all s attached to the environment.
3. From the , click Data Lakes > Environment Name, scroll to the bottom of the Data Lake details page, and click the Upgrade tab.
4. Click the Target Version drop-down menu to see any available upgrades.
5. If you want to skip the automatic backup that is taken before the upgrade, uncheck the Automatic backup box.
6. Click Validate and Prepare to check for any configuration issues and begin the parcel download and distribution.
7. Click Upgrade to initiate the upgrade.
8. Click the Event History tab to monitor the upgrade process and verify that it completes successfully.
For more information, see Data Lake upgrade.
Upgrade the new cluster to 7.3.1
1. Check the support matrix for upgrades.
2. Start the cluster.
3. Check the current version of .
4. If your cluster uses Streams Replication Manager, export or migrate aggregated metrics.
5. If you use autoscaling, disable autoscaling on the cluster.
6. Upgrade the cluster.
7. Monitor the upgrade progress using the Event History tab.
8. When the upgrade is complete, verify the new version.
9. If you disabled autoscaling on the cluster, you can re-enable it after upgrade.
For more information, see Upgrading s.

Final steps🔗

After the upgrade and application migration are complete:

Check the status of your Data Lakes, s, and clusters.
Perform benchmark testing on your applications. See Spark Application Migration.