Fixed Issues in Apache Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.1.9.

CDPD-42599: Spark - Update log4j1 to reload4j
Migrated log4j1 to reload4j to avoid CVE
CDPD-58080: Backport SPARK-32951 to Spark 2
Foldables can be propagated from the Aggregate function.
CDPD-50679: Backport CDPD-47129 to 7.1.9
Now handles empty CSV fields using OpenCSVSerde.
CDPD-50203: Backport SPARK-27254 to 7.1.9
The cleanup completes but becomes invalid in output files for ManifestFileCommitProtocol if the job is aborted.
CDPD-50205: Backport SPARK-32638 to 7.1.9
Prefiously, the WidenSetOperationTypes in a subquery attribute was missing.
CDPD-50206: Backport CDPD-43553 to 7.1.9
Jersey was upgraded to 2.36 to avoid common vulnerabilities and exposures (CVE).
CDPD-50161: Backport CDPD-47449 to 7.1.9
Previously, Spark job failed with NPE while adding kafka-log4j-appender to the classpath.
CDPD-50202: Backport SPARK-27210 to 7.1.9
This patch proposes ManifestFileCommitProtocol to clean up incomplete output files in task level if task aborts.
CDPD-52721: Sqoop - Replace log4j 1.x with reload4j
The log4j was replaced with reload4j in Sqoop.
CDPD-43434: Implement support for preventing incompatible log4j classes to be loaded in Sqoop
A safe-guard was put in place to ensure Sqoop always loads the correct logging related Jars independently from the classpath order.

Apache patch information

  • SPARK-27210
  • SPARK-32638
  • SPARK-27254
  • SPARK-32951