Differences between Spark and Spark 3 actions
Learn about the differences between a Spark action and a Spark 3 action definition, the difference in the logging frameworks used in Spark and Spark 3, and the action credentials.
There are several notable distinctions between a Spark action and a Spark 3 action
definition. Firstly, the XML element representing the action differs between the two. For Spark
actions, it is denoted as spark
, whereas for Spark 3 actions, it is labeled as
spark3
.
Additionally, the support for certain tags also varies. Specifically, the Spark 3
action definition does not include support for the job-tracker
tag, but instead
exclusively employs the resource-manager
tag.
In contrast to Oozie's Spark action, where omitting the <mode>
tag in the workflow definition allows setting the <master>
tag to either
yarn-cluster
or yarn-client
, the same flexibility does not
apply to Spark 3 actions. With Spark 3 actions, if you previously set the
<master>
tag to yarn-cluster
in Spark actions, you must set
the <master>
tag to yarn
and the <mode>
tag to cluster
in Spark 3 actions. Similarly, if you previously set the
<master>
tag to yarn-client
in Spark actions, for Spark 3
actions, you must set the <master>
tag to yarn
and the
<mode>
tag to client
.
Differences in the actions’ configuration
To configure Spark actions, there are multiple configuration options available.
These options are prefixed with oozie.service.SparkConfigurationService
. For
example:
-
oozie.service.SparkConfigurationService.spark.configurations.blacklist
-
oozie.service.SparkConfigurationService.hive2.configurations
In order to differentiate between configurations for Spark actions and Spark 3
actions, the prefix for properties related to Spark 3 actions is modified to
oozie.service.Spark3ConfigurationService
.
During the migration from Spark actions to Spark 3 actions, it is necessary to adjust the prefix for any Spark action configuration that is configured in Cloudera Manager using a safety-valve.
Log4j vs. Log4j 2
When comparing Spark 2 and Spark 3, there is a difference in the logging frameworks used. While Spark 2 relies on log4j or reload4j, Spark 3 has transitioned to log4j2. As a result, Oozie's Spark 3 action also utilizes log4j2.
Oozie provides a logging property file for both Spark and Spark 3 actions. However,
there is a distinction in the naming conventions of these files. In the case of Spark actions,
the file is named spark-log4j.properties
, whereas for Spark 3 actions, it is
named spark3-log4j2.properties
.
During the migration process from Spark to Spark 3 actions, if you have a custom
spark-log4j.properties
file located in the lib folder of your workflow or
within Oozie's ShareLib, you need to rename this file. Since Spark 3 uses log4j2, you might also
need to modify your custom Spark logging configuration file to ensure compatibility with
log4j2.
Action credentials
The Spark 3 action in Oozie provides support for the same credentials as Spark actions.