Differences between Spark and Spark 3 actions

Learn about the differences between a Spark action and a Spark 3 action definition, the difference in the logging frameworks used in Spark and Spark 3, and the action credentials.

There are several notable distinctions between a Spark action and a Spark 3 action definition. Firstly, the XML element representing the action differs between the two. For Spark actions, it is denoted as spark, whereas for Spark 3 actions, it is labeled as spark3.

Additionally, the support for certain tags also varies. Specifically, the Spark 3 action definition does not include support for the job-tracker tag, but instead exclusively employs the resource-manager tag.

In contrast to Oozie's Spark action, where omitting the <mode> tag in the workflow definition allows setting the <master> tag to either yarn-cluster or yarn-client, the same flexibility does not apply to Spark 3 actions. With Spark 3 actions, if you previously set the <master> tag to yarn-cluster in Spark actions, you must set the <master> tag to yarn and the <mode> tag to cluster in Spark 3 actions. Similarly, if you previously set the <master> tag to yarn-client in Spark actions, for Spark 3 actions, you must set the <master> tag to yarn and the <mode> tag to client.

Differences in the actions’ configuration

To configure Spark actions, there are multiple configuration options available. These options are prefixed with oozie.service.SparkConfigurationService. For example:

  • oozie.service.SparkConfigurationService.spark.configurations.blacklist

  • oozie.service.SparkConfigurationService.hive2.configurations

In order to differentiate between configurations for Spark actions and Spark 3 actions, the prefix for properties related to Spark 3 actions is modified to oozie.service.Spark3ConfigurationService.

During the migration from Spark actions to Spark 3 actions, it is necessary to adjust the prefix for any Spark action configuration that is configured in Cloudera Manager using a safety-valve.

Log4j vs. Log4j 2

When comparing Spark 2 and Spark 3, there is a difference in the logging frameworks used. While Spark 2 relies on log4j or reload4j, Spark 3 has transitioned to log4j2. As a result, Oozie's Spark 3 action also utilizes log4j2.

Oozie provides a logging property file for both Spark and Spark 3 actions. However, there is a distinction in the naming conventions of these files. In the case of Spark actions, the file is named spark-log4j.properties, whereas for Spark 3 actions, it is named spark3-log4j2.properties.

During the migration process from Spark to Spark 3 actions, if you have a custom spark-log4j.properties file located in the lib folder of your workflow or within Oozie's ShareLib, you need to rename this file. Since Spark 3 uses log4j2, you might also need to modify your custom Spark logging configuration file to ensure compatibility with log4j2.

Action credentials

The Spark 3 action in Oozie provides support for the same credentials as Spark actions.