Migration of Spark 2 applications

To ensure a smooth migration of your underlying Spark 2 application when using Oozie's Spark 3 action, it is highly recommended to follow the official Spark 2 to Spark 3 migration guide. Specifically, refer to the Cloudera runtime documentation's comprehensive resource titled Migrating Spark applications.

This guide provides you with detailed instructions and best practices specifically tailored to the migration process. By adhering to this guide, you can effectively transition your Spark 2 application and leverage the capabilities of Spark 3 seamlessly within Oozie.

It is highly advised to first test your migrated Spark 3 applications directly using the official Spark 3 runtime. By executing your applications with spark3-submit before running them as Oozie actions, you can identify any potential issues and find the root cause. This approach enables you to determine whether any problems arise from Oozie, Spark 3 itself, or the compatibility of your migrated application. Taking this proactive step assists you in troubleshooting and resolving any potential obstacles during the migration process.

Migrating Java applications

If you are currently running Java applications with Oozie's Spark action, there are several important considerations and steps to follow:

  • Recompile your application using Spark 3 dependencies instead of Spark 2 dependencies.

  • If you are using Scala, it may be necessary to migrate from Scala 2.11 to Scala 2.12 according to the official Spark migration guide.

  • If your application relies on the Scala module of the Jackson library, you might need to replace the Scala 2.11 flavor with the Scala 2.12 flavor.

  • Ensure that your 3rd-party runtime dependencies align with the versions used by Spark 3 and Oozie's Spark 3 action.

  • Since Spark 3 has transitioned from log4j to log4j2, you might need to adjust the logging library and/or logging configuration used in your application. Additionally, ensure that all necessary logging frameworks are present in the classpath. Note that Oozie's Spark 3 action executor configures log4j2 instead of log4j, meaning that log4j runtime libraries are no longer included in the classpath by default, only log4j2 libraries are.

By following these guidelines, you can successfully migrate your Java applications to work seamlessly with Oozie's Spark 3 action.

Migrating Python applications

When upgrading PySpark applications from Spark 2 to Spark 3, it is important to consider not only the framework migration but also the compatibility of your Python application with Python 3. If your Python application is not compatible, you need to perform a migration from Python 2 to Python 3 as well.

Given that Spark 2 is now deprecated and Python 2 has reached its end of life, Cloudera strongly advise migrating your PySpark applications from Python 2 and Spark2 simultaneously. This approach ensures that your application runs on up-to-date frameworks and avoids potential security vulnerabilities.

However, depending on the nature of your applications (such as their size or quantity), you might want to take the following steps:

  1. Follow the instructions in the Enable Spark actions section to re-enable Spark actions.

  2. Install Python 2 to a custom location to prevent interference with other services and avoid its usage by other services.

  3. Configure Oozie's Spark action globally to use this custom Python 2 installation. Refer to the Using Spark actions with a custom Python executable section for guidance.

  4. Additionally, install Python 3.7 to a custom location.

  5. Begin migrating your PySpark applications to ensure compatibility with Python 3.

  6. For workflows where you have already made the underlying PySpark application Python 3 compatible, enable them to run with the custom Python 3 installation instead of Python 2. Refer to the Using Spark actions with a custom Python executable section for detailed instructions.

  7. After all the Spark actions in a workflow are compatible with Python 3, start migrating the Spark actions to Spark 3 actions.

By following these steps, you can successfully migrate your PySpark applications from Python 2 and Spark 2 to Python 3 and Spark 3.