Upgrading Apache Spark 2.4.8 on 7.1.8 to Spark 3 on 7.3.1

The following steps will help you upgrading from Apache Spark 2.4.8 on Cloudera Private Cloud 7.1.8 to Spark 3.4.1 on 7.3.1.

Source cluster version Source cluster Spark 2 version Source cluster Spark 3 version Target cluster version Target cluster Spark 3 version Spark 2 used with connectors1
7.1.8 2.4.8 none 7.3.1 3.4.1 no

Pre-application migration tasks

Install the CDS parcel and Spark 3 services, as described in the CDS parcel documentation. A short overview of the process is as follows:
  1. Check that all the software prerequisites are satisfied.
  2. In the Cloudera Manager Admin Console, add the CDS parcel repository to the Remote Parcel Repository URLs in Parcel Settings.
  3. Download the CDS parcel, distribute it to the hosts in your cluster, and activate it.
  4. Add the Spark 3 service to your cluster.
  5. Return to the Home page.
  6. Click the stale configuration icon to launch the Stale Configuration wizard and restart the necessary services.

Application migration tasks (Spark 2 to Spark 3)

Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.3.0
  1. Check the supported Java versions.
  2. Check the supported Scala version.
  3. Check the supported Python versions.
  4. Account for changed or versioned Spark commands in your code. (spark-submit, pyspark, etc.)
  5. Check supported versions for Spark connectors.
  6. Check the logging library used in your code.
  7. Check the compatibility of 3rd-party libraries used in your code.
  8. Check Spark behavior changes and refactor your code.

Post-application migration tasks

  1. Stop the Livy (Livy for Spark 2) and Spark 2 (SPARK_ON_YARN) services.
  2. Delete the Spark 2 and Livy for Spark 2 services.
  3. Move Spark 2 event logs to the Spark 3 event logs directory.

In-place cluster upgrade

  1. Upgrade the Cloudera Manager Server to 7.13.1.0
    1. Back up the Cloudera Manager server databases, working directories, and several other entities. These backups can be used to restore your Cloudera Manager deployment if there are problems during the upgrade.
    2. Upgrade the Cloudera Manager server software on the Cloudera Manager host using package commands from the command line (for example, yum on RHEL systems). Cloudera Manager automates much of this process and is recommend for upgrading and managing your CDH/Cloudera Runtime clusters.
    3. Upgrade the Cloudera Manager agent software on all cluster hosts. The Cloudera Manager upgrade wizard can upgrade the agent software (and, optionally, the JDK), or you can install the agent and JDK software manually. The CDH or Cloudera Runtime software is not upgraded during this process.
    For more information, see Upgrading Cloudera Manager 7.
  2. Cloudera Manager handles deactivation of the SPARK3 parcel. The parcel itself is not removed but deactivated.

Application migration tasks (Spark 3.x to Spark 3.4.1)

Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 3.3.x to 3.4.1
  1. Refactor your Spark application code.

Final steps

After the upgrade and application migration are complete:
  1. Check the status of your clusters.
  2. Perform benchmark testing on your applications. See Spark Application Migration.
1 Oozie, Solr, Phoenix, Hive Warehouse Connector, Spark Schema Registry