Fixed Issues in Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.7.

CDPD-16010: Removed netty3 dependency
This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-18652: Adapt SAC to new Machine Learning event listener in CDP Spark 2.4
This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-16748: Improve LeftSemi SortMergeJoin right side buffering.
This issue is now resolved.
CDPD-17422: Improve null-safe equi-join key extraction.
This issue is now resolved.
CDPD-18458: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result.
This issue is now resolved.
CDPD-1138: Spark Atlas Connector tracks column-level lineage
This issue is now resolved.
CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:
  • Using dynamic partition inserts.
  • Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).
  • Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).
  • Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).
This issue is now resolved.
CDPD-15385: Currently, delegation token support for Spark DStreams is not available.
Added Kafka delegation token support for DStreams in the Spark 2.4.5. This issue is now resolved.
CDPD-15735: Oozie Spark actions are failing because Spark and Kafka are using different Scala versions.
This issue is now resolved.
CDPD-10532: Update log4j to address CVE-2019-17571
Replaced log4j with an internal version to fix CVE-2019-17571.
CDPD-10515: Incorrect version of jackson-mapper-asl
Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss
To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
CDPD-15773: In the previous versions, applications that share a Spark Session across multiple threads was experiencing a deadlock accessing the HMS.
This issue is now resolved.

Apache patch information

Apache patches in this release. These patches do not have an associated Cloudera bug ID.

  • SPARK-17875