Upgrading Apache Spark 2.4.8 on Cloudera Private Cloud 7.1.9 SP1 to Spark 3.4.1 on 7.3.1
The following steps will help you upgrade from Apache Spark 2.4.8 on Cloudera Private Cloud version 7.1.9 SP1 to Spark 3.4.1 on version 7.3.1.
Source cluster version | Source cluster Spark 2 version | Source cluster Spark 3 version | Target cluster version | Target cluster Spark 3 version |
---|---|---|---|---|
7.1.9 SP1 | 2.4.8 | none | 7.3.1 | 3.4.1 |
Pre-application migration tasks
Install the CDS parcel and Spark 3 services, as described in the CDS parcel documentation. A short overview of the process is as follows:
- Check that all the software prerequisites are satisfied.
- In the Cloudera Manager Admin Console, add the CDS parcel repository to the Remote Parcel Repository URLs in Parcel Settings.
- Download the CDS parcel, distribute it to the hosts in your cluster, and activate it.
- Add the Spark 3 service to your cluster.
- Return to the Home page.
- Click the stale configuration icon to launch the Stale Configuration wizard and restart the necessary services.
Spark application migration (from Spark 2 to Spark 3)
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.3.2.
- Check the supported Java versions.
- Check the supported Scala version.
- Check the supported Python versions.
-
Account for changed or versioned Spark commands in your code. (
spark-submit
,pyspark
, etc.) - Check supported versions for Spark connectors.
- Check the logging library used in your code.
- Check the compatibility of 3rd-party libraries used in your code.
- Check Spark behavior changes and refactor your code.
Post-application migration tasks
-
Stop the Livy (
Livy for Spark 2
) and Spark 2 (SPARK_ON_YARN
) services. - Delete the Spark 2 and Livy for Spark 2 services.
- Move Spark 2 event logs to the Spark 3 event logs directory.
In-place cluster upgrade
Spark application migration (from Spark 3.x to Spark 3.4.1)
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 3.3.2 to 3.4.1
- Refactor your Spark application code.
Final steps
After the upgrade and application migration are complete:
- Check the status of your clusters.
- Perform benchmark testing on your applications. See Spark Application Migration.