Fixed Issues in Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.8.

CDPD-18938: Jobs disappear intermittently from the SHS under high load.: SPARK-33841 has been back-ported to CDPD in order to fix the issue with jobs disappearing intermittently from the SHS under high load. This issue is now resolved.
CDPD-20434: SHS should be resilient to corrupted event log directories.: SPARK-33146 has been back-ported to CDPD in order to make SHS resilient to corrupted event log directories. This issue is now resolved.
CDPD-16010: Removed netty3 dependency.: This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-18652: Adapt SAC to new Machine Learning event listener in CDP Spark 2.4: This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-16748: Improve LeftSemi SortMergeJoin right side buffering.: This issue is now resolved.
CDPD-17422: Improve null-safe equi-join key extraction.: This issue is now resolved.
CDPD-18458: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result.: This issue is now resolved.
CDPD-1138: Spark Atlas Connector tracks column-level lineage: This issue is now resolved.
CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:: Using dynamic partition inserts.

Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).

Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).

Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).

This issue is now resolved.
CDPD-15385: Currently, delegation token support for Spark DStreams is not available.: Added Kafka delegation token support for DStreams in the Spark 2.4.5. This issue is now resolved.
CDPD-15735: Oozie Spark actions are failing because Spark and Kafka are using different Scala versions.: This issue is now resolved.
CDPD-10532: Update log4j to address CVE-2019-17571: Replaced log4j with an internal version to fix CVE-2019-17571.
CDPD-10515: Incorrect version of jackson-mapper-asl: Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss: To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
CDPD-15773: In the previous versions, applications that share a Spark Session across multiple threads was experiencing a deadlock accessing the HMS.: This issue is now resolved.

Apache patch information🔗

Apache patches in this release. These patches do not have an associated Cloudera bug ID.

SPARK-17875
SPARK-33841

Fixed Issues in Spark

Apache patch information🔗

We want your opinion

How can we improve this page?