This section lists the issues in Apache Spark that are fixed in Cloudera Runtime 7.3.1 release, its service packs and cumulative
hotfixes.
Cloudera Runtime 7.3.1.500 SP3
There are no new fixed issues in this release.
Cloudera Runtime 7.3.1.400 SP2
- CDPD-75091: Backport SPARK-47217 and related
changes
- 7.3.1.400 SP2
- Backports upstream Apache Spark improvements to enable
reading Parquet files with mixed or widened types without precision loss or
failures.
- Apache Jira: SPARK-47217
Cloudera Runtime 7.3.1.300 SP1 CHF 1
- CDPD-79763: Fix clobbering of files across epochs in
Spark Structured streaming with Iceberg
- 7.3.1.300 SP1 CHF1
- Backporting an upstream fix for a bug in structured
streaming that resulted in clobbering of files in Iceberg tables by.
Cloudera Runtime 7.3.1.200 SP1
- CDPD-79251: Spark - Timestamp read/write performance
degradation
- 7.3.1.200 SP1
- Fixing an issue where conversion between Spark's
internal timestamp representation and Hive's Timestamp representation were
slower on Spark 3 than on Spark 2.
- CDPD-76849: Backport SPARK-40876 and related
changes
- 7.3.1.200 SP1
- Backporting SPARK-41096, SPARK-46092, SPARK-45604,
SPARK-46466, SPARK-40876, and SPARK-48603
- Apache Jira: SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, SPARK-48603
- CDPD-70233: Rebase CDP 7.3.x Spark3 on Apache Spark
3.5.4
- 7.3.1.200 SP1
- Upgrading Spark from 3.4.1 to 3.5.4. For more
information, refer to Migrating Spark
applications.
Cloudera Runtime 7.3.1.100 CHF 1
- CDPD-76229: Optimize the processing speed of
BinaryArithmetic#dataType when processing multi-column
data
- 7.3.1.100 CHF1
-
Restoring performance of some queries in Spark 3.4.1 to match other
versions (3.3.x, 3.5.x) of Spark.
Optimized the processing speed of
BinaryArithmetic#dataType when processing
multi-column data.
- Apache Jira: SPARK-45071
- CDPD-75926: Backport SPARK-44653
- 7.3.1.100 CHF1
- Backported SPARK-44653 to fix cache breaking with
non-trivial DataFrame unions.
- Apache Jira: SPARK-44653
- CDPD-75755:
[ENCODER_NOT_FOUND] Not found an
encoder of the type T to Spark SQL internal representation when
using Parameterized Bean
- 7.3.1.100 CHF1
- Fixed an upstream regression causing Encoder Exception
for a parameterized class
- Apache Jira: SPARK-46679
- CDPD-75622: Backport upstream fixes for handling
nested beans and generic type beans while creating Spark encoders.
- 7.3.1.100 CHF1
-
Backporting upstream fixes from Spark 3.4 to fix the following
issues:
- Starting from Spark 3.4.x,
Encoders.bean raised an
exception when the passed class contains a field whose type is a
nested bean with type arguments
- From Spark 3.4.x, an exception is raised when
Encoders.bean is called providing a bean having
read-only properties
- Unsupported feature of bean encoder when the superclass of the bean
has generic type arguments
- Apache Jira: APACHE-44634, APACHE-45081, APACHE-44910
- CDPD-75353:
CHAR and
VARCHAR handling in Spark 3 is incompatible with Spark
2
- 7.3.1.100 CHF1
-
Adding a new configuration
spark.cloudera.legacy.charVarcharLegacyPadding (by
default set to false in Spark 3). When set to
true (together with
spark.sql.legacy.charVarcharAsString=true) it
creates compatibility with Spark 2 behavior.
For more information refer to Migrating
Spark applications.
- CDPD-75286: Spark History UI -
StreamConstraintsException: String length exceeds the maximum length
- 7.3.1.100 CHF1
- Fixing an issue with Jackson to allow unlimited json
string length in Spark event logs.
- CDPD-59617: Spark - Upgrade Okio to 1.17.6 due to
CVE-2023-3635
- 7.3.1.100 CHF1
- Updating
okio from version 1.15.0 to
1.17.6 to address the security vulnerability CVE-2023-3635.
- CDPD-74730: Backport SPARK-46239: Hide the Jetty
server's version
- 7.3.1.100 CHF1
- The Jetty server's version is now hidden.
- Apache Jira: SPARK-46239
- CDPD-73233: Encoder not found of the type T to Spark
SQL internal representation
- 7.3.1.100 CHF1
- Fixing an upstream regression of encoder exception
(
org.apache.spark.SparkUnsupportedOperationException:
[ENCODER_NOT_FOUND]) for generic types.
- Apache Jira: SPARK-49789
Cloudera Runtime 7.3.1
- CDPD-74697 - Spark Iceberg vectorized Parquet read of decimal column is incorrect
- 7.3.1
- CDPD-72774 - Use common versions of commons-dbcp2 and commons-pool2
- 7.3.1
- CDPD-70114 - Redirect spark-submit, spark-shell etc. scripts to their Spark 3 counterparts
- 7.3.1
- CDPD-58844 - Spark - Upgrade Janino to 3.1.10 due to CVE-2023-33546
- 7.3.1
- CDPD-48171 - Spark3 - Upgrade snakeyaml due to CVE-2022-1471
- 7.3.1