Spark 3 compatibility action executor

To facilitate smooth transitions from Oozie's Spark actions to Spark 3 actions, you can use the Spark 3 compatibility action executor. The purpose of this executor is to allow you to retain your existing Spark action definitions in your workflows while executing them with Spark 3 instead of Spark 2.

When you use this executor, Oozie automatically converts a Spark action definition to a Spark 3 action definition before executing it. To configure the Spark 3 action, Oozie utilizes the Spark action configurations (prefixed with oozie.service.SparkConfigurationService) and converts them to Spark 3 configurations.

This feature enables you to run Spark 3 actions without making any modifications to your workflow definitions.

Additionally, remember that your Python or Java Spark actions must be runtime and binary compatible with Spark 3. This is necessary to execute them using the compatibility action executor. For more detailed information on this topic, please refer to the Migration of Spark 2 applications in this document.

To enable Spark 3 compatibility mode for Spark 2 action workflow definitions, you use the oozie.action.spark.compatibility property.

You can configure this property in the following ways:
  • Global configuration:
    1. Go to the Oozie configuration page in Cloudera Manager and search for Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml.
    2. Add a new key with the name oozie.action.spark.compatibility and set the value to true.
    3. Redeploy Oozie.
  • Workflow-level configuration

    Add the property to the global configuration section of your workflow.xml file. Alternatively, you can add it to your job.properties file.

  • Action-level Configuration

    If you only want to enable compatibility mode for a specific Spark action in your workflow definition, add an action-level property with the name oozie.action.spark.compatibility and set the value to true.

Limitations

Due to the high customizability of Oozie's Spark and Spark 3 actions, certain restrictions and limitations have been introduced. The current known limitations are outlined as follows:
  • The following properties are not permitted in your action configuration, which includes configurations at the global level, workflow level, and action level:
    • oozie.action.sharelib.for.spark
    • oozie.action.sharelib.for.spark3
    • oozie.action.sharelib.for.spark.exclude
    • oozie.action.sharelib.for.spark3.exclude
  • As the Spark 3 action utilizes log4j2 instead of log4j (used by the Spark action), it is crucial to avoid mixing the two in compatibility mode. Therefore, having a spark-log4j.properties file in the lib folder of your workflow or in ShareLib is not allowed.