Upgrading Apache Spark 2.4.8 on 7.1.8 to Spark 3 on 7.3.1
The following steps will help you upgrading from Apache Spark 2.4.8 on Cloudera Private Cloud 7.1.8 to Spark 3.4.1 on 7.3.1.
Source cluster version | Source cluster Spark 2 version | Source cluster Spark 3 version | Target cluster version | Target cluster Spark 3 version | Spark 2 used with connectors1 |
---|---|---|---|---|---|
7.1.8 | 2.4.8 | none | 7.3.1 | 3.4.1 | no |
Pre-application migration tasks
Install the CDS parcel and Spark 3 services, as described in the CDS parcel documentation. A short overview of the process is as follows:
- Check that all the software prerequisites are satisfied.
- In the Cloudera Manager Admin Console, add the CDS parcel repository to the Remote Parcel Repository URLs in Parcel Settings.
- Download the CDS parcel, distribute it to the hosts in your cluster, and activate it.
- Add the Spark 3 service to your cluster.
- Return to the Home page.
- Click the stale configuration icon to launch the Stale Configuration wizard and restart the necessary services.
Application migration tasks (Spark 2 to Spark 3)
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.3.0
- Check the supported Java versions.
- Check the supported Scala version.
- Check the supported Python versions.
-
Account for changed or versioned Spark commands in your code. (
spark-submit
,pyspark
, etc.) - Check supported versions for Spark connectors.
- Check the logging library used in your code.
- Check the compatibility of 3rd-party libraries used in your code.
- Check Spark behavior changes and refactor your code.
Post-application migration tasks
-
Stop the Livy (
Livy for Spark 2
) and Spark 2 (SPARK_ON_YARN
) services. - Delete the Spark 2 and Livy for Spark 2 services.
- Move Spark 2 event logs to the Spark 3 event logs directory.
In-place cluster upgrade
Application migration tasks (Spark 3.x to Spark 3.4.1)
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 3.3.x to 3.4.1
- Refactor your Spark application code.
Final steps
After the upgrade and application migration are complete:
- Check the status of your clusters.
- Perform benchmark testing on your applications. See Spark Application Migration.
1 Oozie, Solr, Phoenix, Hive Warehouse Connector, Spark Schema Registry