Upgrading Apache Spark 2.4.8 (with 3.4.1 bundled) on 7.2.18 SP2 to Spark 3 on 7.3.1
The following steps will help you upgrading from Apache Spark 2.4.8 (with 3.4.1 bundled) on Cloudera on cloud 7.2.18 SP2 to Spark 3.4.1 on 7.3.1.
Application migration tasks (Spark 2 to 3)
-
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.4.1
- Check the supported Java versions.
- Check the supported Scala version.
- Check the supported Python versions.
-
Account for changed or versioned Spark commands in your code. (
spark-submit,pyspark, etc.) - Check supported versions for Spark connectors.
- Check the logging library used in your code.
- Check the compatibility of 3rd-party libraries used in your code.
- Check Spark behavior changes and refactor your code.
- Finish migrating Spark 2 workloads to Spark 3 on the cluster before you continue.
Post-application migration tasks
-
Stop the Livy (
Livy for Spark 2) and Spark 2 (SPARK_ON_YARN) services. - Delete the Spark 2 and Livy for Spark 2 services.
- Move Spark 2 event logs to the Spark 3 event logs directory.
- Continue after Spark 2 services and roles are fully removed from the cluster.
In-place cluster upgrade
Final steps
After the upgrade and application migration are complete:
- Check the status of your clusters.
- Perform benchmark testing on your applications. See Spark Application Migration.
