Fixed Issues in Apache Spark
Review the list of Spark issues that are resolved in Cloudera Runtime 7.1.9.
- CDPD-42599: Spark - Update log4j1 to reload4j
- Migrated log4j1 to reload4j to avoid CVE
- CDPD-58080: Backport SPARK-32951 to Spark 2
- Foldables can be propagated from the Aggregate function.
- CDPD-50679: Backport CDPD-47129 to 7.1.9
- Now handles empty CSV fields using OpenCSVSerde.
- CDPD-50203: Backport SPARK-27254 to 7.1.9
- The cleanup completes but becomes invalid in output files for ManifestFileCommitProtocol if the job is aborted.
- CDPD-50205: Backport SPARK-32638 to 7.1.9
- Prefiously, the WidenSetOperationTypes in a subquery attribute was missing.
- CDPD-50206: Backport CDPD-43553 to 7.1.9
- Jersey was upgraded to 2.36 to avoid common vulnerabilities and exposures (CVE).
- CDPD-50161: Backport CDPD-47449 to 7.1.9
- Previously, Spark job failed with NPE while adding kafka-log4j-appender to the classpath.
- CDPD-50202: Backport SPARK-27210 to 7.1.9
- This patch proposes ManifestFileCommitProtocol to clean up incomplete output files in task level if task aborts.
- CDPD-52721: Sqoop - Replace log4j 1.x with reload4j
- The log4j was replaced with reload4j in Sqoop.
- CDPD-43434: Implement support for preventing incompatible log4j classes to be loaded in Sqoop
- A safe-guard was put in place to ensure Sqoop always loads the correct logging related Jars independently from the classpath order.
Apache patch information
- SPARK-27210
- SPARK-32638
- SPARK-27254
- SPARK-32951