Fixed Issues in Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.9.

CDPD-21614: Spark SQL TRUNCATE table not permitted on external purge tables.
In order to retain the legacy Hive1/Hive2 behavior around managed non-acid tables, the migration process instructed to convert those tables to external with external.table.purge=true table property. There were issues that the TRUNCATE TABLE operation cannot be performed yhrough Spark SQL on those tables. Spark now allows you to TRUNCATE an external table if external.table.purge is set to true in table properties. This issue is now resolved.
CDPD-18938: Jobs disappear intermittently from the SHS under high load.
SPARK-33841 has been back-ported to CDPD in order to fix the issue with jobs disappearing intermittently from the SHS under high load. This issue is now resolved.
CDPD-20434: SHS should be resilient to corrupted event log directories.
SPARK-33146 has been back-ported to CDPD in order to make SHS resilient to corrupted event log directories. This issue is now resolved.
CDPD-16010: Removed netty3 dependency.
This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-18652: Adapt SAC to new Machine Learning event listener in CDP Spark 2.4
This replaces an internal patch of Spark Machine Learning events to the community based one. This issue is now resolved.
CDPD-16748: Improve LeftSemi SortMergeJoin right side buffering.
This issue is now resolved.
CDPD-17422: Improve null-safe equi-join key extraction.
This issue is now resolved.
CDPD-18458: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result.
This issue is now resolved.
CDPD-1138: Spark Atlas Connector tracks column-level lineage
This issue is now resolved.
CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:
  • Using dynamic partition inserts.
  • Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).
  • Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).
  • Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).
This issue is now resolved.
CDPD-15385: Currently, delegation token support for Spark DStreams is not available.
Added Kafka delegation token support for DStreams in the Spark 2.4.5. This issue is now resolved.
CDPD-15735: Oozie Spark actions are failing because Spark and Kafka are using different Scala versions.
This issue is now resolved.
CDPD-10532: Update log4j to address CVE-2019-17571
Replaced log4j with an internal version to fix CVE-2019-17571.
CDPD-10515: Incorrect version of jackson-mapper-asl
Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss
To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
CDPD-15773: In the previous versions, applications that share a Spark Session across multiple threads was experiencing a deadlock accessing the HMS.
This issue is now resolved.

Apache patch information

Apache patches in this release. These patches do not have an associated Cloudera bug ID.

  • SPARK-17875
  • SPARK-33841