Fixed Issues in Spark
Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.8.
- CDPD-18938: Jobs disappear intermittently from the SHS under high load.
- SPARK-33841 has been back-ported to CDPD in order to fix the issue with jobs disappearing intermittently from the SHS under high load. This issue is now resolved.
- CDPD-20434: SHS should be resilient to corrupted event log directories.
- SPARK-33146 has been back-ported to CDPD in order to make SHS resilient to corrupted event log directories. This issue is now resolved.
- CDPD-16010: Removed netty3 dependency.
- This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
- CDPD-18652: Adapt SAC to new Machine Learning event listener in CDP Spark 2.4
- This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
- CDPD-16748: Improve LeftSemi SortMergeJoin right side buffering.
- This issue is now resolved.
- CDPD-17422: Improve null-safe equi-join key extraction.
- This issue is now resolved.
- CDPD-18458: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result.
- This issue is now resolved.
- CDPD-1138: Spark Atlas Connector tracks column-level lineage
- This issue is now resolved.
- CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:
- Using dynamic partition inserts.
- Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).
- Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).
- Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).
- CDPD-15385: Currently, delegation token support for Spark DStreams is not available.
- Added Kafka delegation token support for DStreams in the Spark 2.4.5. This issue is now resolved.
- CDPD-15735: Oozie Spark actions are failing because Spark and Kafka are using different Scala versions.
- This issue is now resolved.
- CDPD-10532: Update log4j to address CVE-2019-17571
- Replaced log4j with an internal version to fix CVE-2019-17571.
- CDPD-10515: Incorrect version of jackson-mapper-asl
- Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
- CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss
- To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
- CDPD-15773: In the previous versions, applications that share a Spark Session across multiple threads was experiencing a deadlock accessing the HMS.
- This issue is now resolved.
Apache patch information
Apache patches in this release. These patches do not have an associated Cloudera bug ID.
- SPARK-17875
- SPARK-33841