Upgrading Apache Spark 2.4.8 (with CDS 3.3.2) on 7.1.9 SP1 to Spark 3 on 7.3.1
The following steps will help you upgrading from Apache Spark 2.4.8 on Cloudera Private Cloud 7.1.9 SP1 to Spark 3.4.1 on 7.3.1.
Source cluster version | Source cluster Spark 2 version | Source cluster Spark 3 version | Target cluster version | Target cluster Spark 3 version | Spark 2 used with connectors1 |
---|---|---|---|---|---|
7.1.9 SP1 | 2.4.8 | 3.3.2 (CDS) | 7.3.1 | 3.4.1 | no |
Application migration tasks from Spark 2 to Spark 3
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.3.2.
- Check the supported Java versions.
- Check the supported Scala version.
- Check the supported Python versions.
-
Account for changed or versioned Spark commands in your code. (
spark-submit
,pyspark
, etc.) - Check supported versions for Spark connectors.
- Check the logging library used in your code.
- Check the compatibility of 3rd-party libraries used in your code.
- Check Spark behavior changes and refactor your code.
Post-application migration tasks
-
Stop the Livy (
Livy for Spark 2
) and Spark 2 (SPARK_ON_YARN
) services. - Delete the Spark 2 and Livy for Spark 2 services.
- Move Spark 2 event logs to the Spark 3 event logs directory.
In-place cluster upgrade
Spark application migration (from Spark 3.x to Spark 3.4.1)
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 3.3.2 to 3.4.1
- Refactor your Spark application code.
Final steps
After the upgrade and application migration are complete:
- Check the status of your clusters.
- Perform benchmark testing on your applications. See Spark Application Migration.
1 Oozie, Solr, Phoenix, Hive Warehouse Connector, Spark Schema Registry