Fixed Issues in Spark
Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.2.
- CDPD-1138: Spark Atlas Connector tracks column-level lineage
- This issue is now resolved.
- CDPD-14906: Spark reads or writes TIMESTAMP data for values before the start of the Gregorian calendar. This happens when Spark is:
- Using dynamic partition inserts.
- Reading or writing from an ORC table when spark.sql.hive.convertMetastoreOrc=false (the default is true).
- Reading or writing from an Orc table when spark.sql.hive.convertMetastoreOrc=true but spark.sql.orc.impl=hive (the default is native).
- Reading or writing from a Parquet table when spark.sql.hive.convertMetastoreParquet=false (the default is true).
- CDPD-15385: Currently, delegation token support for Spark DStreams is not available.
- Added Kafka delegation token support for DStreams in the Spark 2.4.5. This issue is now resolved.
- CDPD-15735: Oozie Spark actions are failing because Spark and Kafka are using different Scala versions.
- This issue is now resolved.
- CDPD-10532: Update log4j to address CVE-2019-17571
- Replaced log4j with an internal version to fix CVE-2019-17571.
- CDPD-10515: Incorrect version of jackson-mapper-asl
- Use an internal version of jackson-mapper-asl to address CVE-2017-7525.
- CDPD-7882: If an insert statement specifies partitions both statically and dynamically, there is a potential for data loss
- To prevent data loss, this fix throws an exception if partitions are specified both statically and dynamically. You can follow the workarounds provided in the error message.
- CDPD-15773: In the previous versions, applications that share a Spark Session across multiple threads was experiencing a deadlock accessing the HMS.
- This issue is now resolved.
Technical Service Bulletins
- TSB 2021-441: CDP Powered by Apache Spark may incorrectly read/write pre-Gregorian timestamps
- For the latest update on this issue see the corresponding Knowledge article: TSB-2021 441: Spark may incorrectly read/write pre-Gregorian timestamps