Spark 3 compatibility action executor
To facilitate smooth transitions from Oozie's Spark actions to Spark 3 actions, you can use the Spark 3 compatibility action executor. The purpose of this executor is to allow you to retain your existing Spark action definitions in your workflows while executing them with Spark 3 instead of Spark 2.
When you use this executor, Oozie automatically converts a Spark action definition to
a Spark 3 action definition before executing it. To configure the Spark 3 action, Oozie utilizes
the Spark action configurations (prefixed with
oozie.service.SparkConfigurationService
) and converts them to Spark 3
configurations.
Additionally, remember that your Python or Java Spark actions must be runtime and binary compatible with Spark 3. This is necessary to execute them using the compatibility action executor. For more detailed information on this topic, please refer to the Migration of Spark 2 applications in this document.
To enable Spark 3 compatibility mode for Spark 2 action workflow definitions, you use
the oozie.action.spark.compatibility
property.
- Global configuration:
- Go to the Oozie configuration page in Cloudera Manager and search for
Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml
. - Add a new key with the name
oozie.action.spark.compatibility
and set the value totrue
. - Redeploy Oozie.
- Go to the Oozie configuration page in Cloudera Manager and search for
- Workflow-level configuration
Add the property to the global configuration section of your
workflow.xml
file. Alternatively, you can add it to yourjob.properties
file. - Action-level Configuration
If you only want to enable compatibility mode for a specific Spark action in your workflow definition, add an action-level property with the name
oozie.action.spark.compatibility
and set the value totrue
.
Limitations
- The following properties are not permitted in your action configuration, which includes
configurations at the global level, workflow level, and action level:
oozie.action.sharelib.for.spark
oozie.action.sharelib.for.spark3
oozie.action.sharelib.for.spark.exclude
oozie.action.sharelib.for.spark3.exclude
- As the Spark 3 action utilizes log4j2 instead of log4j (used by the Spark action), it is
crucial to avoid mixing the two in compatibility mode. Therefore, having a
spark-log4j.properties
file in the lib folder of your workflow or in ShareLib is not allowed.