Fixed Issues in Apache Spark

This section lists the issues in Apache Spark that are fixed in Cloudera Runtime 7.3.1 release, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.1.400 SP2

CDPD-75091: Backport SPARK-47217 and related changes
7.3.1.400 SP2
Backports upstream Apache Spark improvements to enable reading Parquet files with mixed or widened types without precision loss or failures.
Apache Jira: SPARK-47217

Cloudera Runtime 7.3.1.300 SP1 CHF 1

CDPD-79763: Fix clobbering of files across epochs in Spark Structured streaming with Iceberg
7.3.1.300 SP1 CHF1
Backporting an upstream fix for a bug in structured streaming that resulted in clobbering of files in Iceberg tables by.

Cloudera Runtime 7.3.1.200 SP1

CDPD-79251: Spark - Timestamp read/write performance degradation
7.3.1.200 SP1
Fixing an issue where conversion between Spark's internal timestamp representation and Hive's Timestamp representation were slower on Spark 3 than on Spark 2.
CDPD-76849: Backport SPARK-40876 and related changes
7.3.1.200 SP1
Backporting SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, and SPARK-48603
Apache Jira: SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, SPARK-48603
CDPD-70233: Rebase CDP 7.3.x Spark3 on Apache Spark 3.5.4
7.3.1.200 SP1
Upgrading Spark from 3.4.1 to 3.5.4. For more information, refer to Migrating Spark applications.

Cloudera Runtime 7.3.1.100 CHF 1

CDPD-76229: Optimize the processing speed of BinaryArithmetic#dataType when processing multi-column data
7.3.1.100 CHF1

Restoring performance of some queries in Spark 3.4.1 to match other versions (3.3.x, 3.5.x) of Spark.

Optimized the processing speed of BinaryArithmetic#dataType when processing multi-column data.

Apache Jira: SPARK-45071
CDPD-75926: Backport SPARK-44653
7.3.1.100 CHF1
Backported SPARK-44653 to fix cache breaking with non-trivial DataFrame unions.
Apache Jira: SPARK-44653
CDPD-75755: [ENCODER_NOT_FOUND] Not found an encoder of the type T to Spark SQL internal representation when using Parameterized Bean
7.3.1.100 CHF1
Fixed an upstream regression causing Encoder Exception for a parameterized class
Apache Jira: SPARK-46679
CDPD-75622: Backport upstream fixes for handling nested beans and generic type beans while creating Spark encoders.
7.3.1.100 CHF1

Backporting upstream fixes from Spark 3.4 to fix the following issues:

  • Starting from Spark 3.4.x, Encoders.bean raised an exception when the passed class contains a field whose type is a nested bean with type arguments
  • From Spark 3.4.x, an exception is raised when Encoders.bean is called providing a bean having read-only properties
  • Unsupported feature of bean encoder when the superclass of the bean has generic type arguments
Apache Jira: APACHE-44634, APACHE-45081, APACHE-44910
CDPD-75353: CHAR and VARCHAR handling in Spark 3 is incompatible with Spark 2
7.3.1.100 CHF1

Adding a new configuration spark.cloudera.legacy.charVarcharLegacyPadding (by default set to false in Spark 3). When set to true (together with spark.sql.legacy.charVarcharAsString=true) it creates compatibility with Spark 2 behavior.

For more information refer to Migrating Spark applications.

CDPD-75286: Spark History UI - StreamConstraintsException: String length exceeds the maximum length
7.3.1.100 CHF1
Fixing an issue with Jackson to allow unlimited json string length in Spark event logs.
CDPD-59617: Spark - Upgrade Okio to 1.17.6 due to CVE-2023-3635
7.3.1.100 CHF1
Updating okio from version 1.15.0 to 1.17.6 to address the security vulnerability CVE-2023-3635.
CDPD-74730: Backport SPARK-46239: Hide the Jetty server's version
7.3.1.100 CHF1
The Jetty server's version is now hidden.
Apache Jira: SPARK-46239
CDPD-73233: Encoder not found of the type T to Spark SQL internal representation
7.3.1.100 CHF1
Fixing an upstream regression of encoder exception (org.apache.spark.SparkUnsupportedOperationException: [ENCODER_NOT_FOUND]) for generic types.
Apache Jira: SPARK-49789

Cloudera Runtime 7.3.1

CDPD-74697 - Spark Iceberg vectorized Parquet read of decimal column is incorrect
7.3.1
CDPD-72774 - Use common versions of commons-dbcp2 and commons-pool2
7.3.1
CDPD-70114 - Redirect spark-submit, spark-shell etc. scripts to their Spark 3 counterparts
7.3.1
CDPD-58844 - Spark - Upgrade Janino to 3.1.10 due to CVE-2023-33546
7.3.1
CDPD-48171 - Spark3 - Upgrade snakeyaml due to CVE-2022-1471
7.3.1