Upgrading Apache Spark 2.4.8 on 7.2.18 SP2 to Spark 3 on 7.3.1
The following steps will help you upgrading from Apache Spark 2.4.8 on Cloudera Public Cloud 7.2.18 SP2 to Spark 3.4.1 on 7.3.1.
Source cluster version | Source cluster Spark 2 version | Source cluster Spark 3 version | Target cluster version | Target cluster Spark 3 version | Spark 2 used with connectors1 |
---|---|---|---|---|---|
7.2.18 SP2 | 2.4.8 | none | 7.3.1 | 3.4.1 | no |
Sidecar migration of Data Hub clusters
Sidecar migration tasks for Data Hub clusters
The new 7.2.18 Data Hub cluster needs to use Spark 3 and Livy 3 instead of Spark 2 and Livy 2.
Depending on the template you used for your existing Data Hub clusters, a new custom template might be needed that contains Spark 3 instead of Spark 2. Alternatively, the built-in 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template can be used, as it contains Spark 3 only.
- Check the current services in your template, and add the built-in 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template.
-
If the built-in 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template doesn't work, you can create a custom template. Replace all
Spark 2
andLivy 2
references withSpark 3
andLivy 3
, respectively. - Add a new Spark 3-based 7.2.18 Data Hub cluster to the environment, using your custom template or the built-in 7.2.18 - Data Engineering: Apache Spark3, Apache Hive, Apache Oozie template.
- Migrate all non-spark workloads from the old Data Hub cluster to the new cluster.
Application migration tasks (Spark 2 to 3)
-
Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.4.1
- Check the supported Java versions.
- Check the supported Scala version.
- Check the supported Python versions.
-
Account for changed or versioned Spark commands in your code. (
spark-submit
,pyspark
, etc.) - Check supported versions for Spark connectors.
- Check the logging library used in your code.
- Check the compatibility of 3rd-party libraries used in your code.
- Check Spark behavior changes and refactor your code.
- Migrate all Spark 2 applications in the old Data Hub cluster to Spark 3 applications in the new cluster.
Post-application migration tasks
- Move Spark 2 event logs to the Spark 3 event logs directory.
- Drop the old Data Hub cluster.
In-place cluster upgrade
Final steps
After the upgrade and application migration are complete:
- Check the status of your Data Lakes, Data Hubs, and clusters.
- Perform benchmark testing on your applications. See Spark Application Migration.
1 Oozie, Solr, Phoenix, Hive Warehouse Connector, Spark Schema Registry