Fixed Issues in Apache Oozie

Review the list of Oozie issues that are resolved in Cloudera Runtime 7.1.9.

CDPD-27164: Oozie should not rely on its LoadBalancer internally
Oozie will no longer use the LoadBalancer to issue a callback notification, but instead it will try all available Oozie instances one-by-one. If the callback succeeded against one of the Oozie instances, then we will not try the other ones. This way the LoadBalancer will not be used for such purposes.
CDPD-58538: Oozie should upload and use the config files from sqoop-conf/managers.d when available
Previously, Oozie did not honor Sqoop's managers.d configurations and extra connector Jars from the lib folder, but now both are automatically available in Oozie's Sqoop action, allowing users to seamlessly utilize connectors like the Sqoop Teradata connector without the need for manual configuration updates or copying Jars to the Workflow's lib folder
CDPD-50296: Improve Oozie's app state action checking
Enhanced Oozie's action state checking, to immediately query for running applications right after start-up
CDPD-41425: LAST_ONLY and NONE execution modes
Possible OutOfMemoryError when there are too many coordinator actions to materialize.
If there is a coordinator job defined with a frequency by the minute (e.g. frequency="* * * * *"), and start-time lies well in the past, and the coordinator job's execution-mode is LAST_ONLY or NONE, it can happen that too many CoordinatorActionBean instances are kept on JVM heap within CoordMaterializeTransitionXCommand#insertList as those execution modes omit the check for the throttle value.
As a consequence, we can see as many as multiple hundred thousands of log entries trying to increase CoordMaterializeTransitionXCommand#insertList:
[user@host ~]$ grep 'In storeToDB() coord action id' /var/log/oozie/oozie-HOSTNAME.log.out | wc -l478408

Apache Jira:

CDPD-43192: Extend Oozie Spark sharelib for HBase interaction
An additional HBase Jars is added to sharelib to support proper HBase interaction.
CDPD-43343: Oozie log streaming bug when log timestamps are the same on multiple Oozie servers
Fixed a bug in the mechanism of the Oozie log streaming.
In case there is a log message in server "A" with the same timestamp as an other log message in server "B", then according to the current implementation, the logs acquired by using `TimestampedMessageParser` corresponding to server "B" will be overwritten by server "A" 's parser (due to the operation of timestampMap.put(earliestParser.getLastTimestamp(), earliestParser)), therefore causing the log messages from server "B" to be ignored from that point.
CDPD-44209: SqoopMain's printArgs masks Sqoop command line option if preceding one contains "password"
In Yarn, there was a previous issue in Oozie where command-line arguments were masked incorrectly due to mistaken password detection. As a resolution, customers now have the option to utilize the "oozie.launcher.argumentMaskingExceptionList" configuration. This feature allows them to specify exceptions for password masking. For detailed information on how to use this configuration, please refer to the documentation in oozie-default.xml.
CDPD-46049: SSH action fails when '' property contains double quotes
The SSH action's callback mechanism failed with "Invalid content-type" error when capture-output was used in the action definition.
CDPD-47821: Add missing Sqoop Atlas notification jars to Sqoop share lib
Earlier, Atlas notification was nonfunctional in Oozie's Sqoop action due to missing Jars, but with the inclusion of those Jars in Oozie's Sqoop ShareLib, Atlas notifications are now expected to function correctly in Oozie's Sqoop action.
CDPD-56936: Oozie's db cli tool does not honor custom connection properties
The Oozie DB CLI tool did not respect the "ConnectionProperties" property set by the user through the "" configuration in Oozie.
OPSAPS-64457: Make CM provide Oozie the necessary configuration regarding CDPD-43396
HBase service and Sqoop client dependencies were added for Ooize to have access to their configurations.
OPSAPS-63816: Configure service hosts to Oozie
Cloudera Manager will provide the address of all Oozie server instances as a configuration to all Oozie instances. This will be then used by Oozie's callback mechanism so that instead of making the callback through the LoadBalancer in HA mode, the callback will be attempted through each Oozie instance, and if one of them succeeds, then we stop. This way we'll no longer use the LoadBalancer, and make the callback mechanism safer by not having a middle-man.
OPSAPS-67346: [oozie] Implement validator in CM for Oozie-Spark3 integration
A validator was added which checks that there is a Spark3 role on all Oozie node. If there is any missing Spark3 role then a warning message will be visible on Oozie's CM page listing the nodes.

Apache patch information

  • OOZIE-3666
  • OOZIE-3254