Upgrading Apache Spark 2.4.8 (with 3.4.1 bundled) on 7.2.18 SP2 to Spark 3 on 7.3.1

The following steps will help you upgrading from Apache Spark 2.4.8 (with 3.4.1 bundled) on Cloudera on cloud 7.2.18 SP2 to Spark 3.4.1 on 7.3.1.

Application migration tasks (Spark 2 to 3)

  1. Follow the Spark application migration documentation to migrate your Apache Spark Applications from version 2.4.8 to 3.4.1
    1. Check the supported Java versions.
    2. Check the supported Scala version.
    3. Check the supported Python versions.
    4. Account for changed or versioned Spark commands in your code. (spark-submit, pyspark, etc.)
    5. Check supported versions for Spark connectors.
    6. Check the logging library used in your code.
    7. Check the compatibility of 3rd-party libraries used in your code.
    8. Check Spark behavior changes and refactor your code.
  2. Finish migrating Spark 2 workloads to Spark 3 on the cluster before you continue.

Post-application migration tasks

  1. Stop the Livy (Livy for Spark 2) and Spark 2 (SPARK_ON_YARN) services.
  2. Delete the Spark 2 and Livy for Spark 2 services.
  3. Move Spark 2 event logs to the Spark 3 event logs directory.
  4. Continue after Spark 2 services and roles are fully removed from the cluster.

In-place cluster upgrade

  1. Upgrade the Server to 7.13.1.0
    1. Back up the server databases, working directories, and several other entities. These backups can be used to restore your deployment if there are problems during the upgrade.
    2. Upgrade the server software on the host using package commands from the command line (for example, yum on RHEL systems). automates much of this process and is recommend for upgrading and managing your CDH/ clusters.
    3. Upgrade the agent software on all cluster hosts. The upgrade wizard can upgrade the agent software (and, optionally, the JDK), or you can install the agent and JDK software manually. The CDH or software is not upgraded during this process.
    For more information, see Upgrading 7.
  2. Use to upgrade your clusters from a lower version of to the target version for your Spark upgrade (for example, 7.3.1).
    For more information, see Upgrading to a higher version.
  3. handles deactivation of the SPARK3 parcel. The parcel itself is not removed but deactivated.

Final steps

After the upgrade and application migration are complete:
  1. Check the status of your clusters.
  2. Perform benchmark testing on your applications. See Spark Application Migration.