This section lists the issues in Apache Spark that are fixed in Cloudera Runtime 7.3.1 release, its service packs and cumulative
hotfixes.
Cloudera Runtime 7.3.1.400 SP2
- CDPD-75091: Backport SPARK-47217 and related
changes
- 7.3.1.400 SP2
- Backports upstream Apache Spark improvements to enable
reading Parquet files with mixed or widened types without precision loss or
failures.
- Apache Jira: SPARK-47217
Cloudera Runtime 7.3.1.300 SP1 CHF 1
- CDPD-79763: Fix clobbering of files across epochs in
Spark Structured streaming with Iceberg
- 7.3.1.300 SP1 CHF1
- Backporting an upstream fix for a bug in structured
streaming that resulted in clobbering of files in Iceberg tables by.
Cloudera Runtime 7.3.1.200 SP1
- CDPD-79251: Spark - Timestamp read/write performance
degradation
- 7.3.1.200 SP1
- Fixing an issue where conversion between Spark's
internal timestamp representation and Hive's Timestamp representation were
slower on Spark 3 than on Spark 2.
- CDPD-76849: Backport SPARK-40876 and related
changes
- 7.3.1.200 SP1
- Backporting SPARK-41096, SPARK-46092, SPARK-45604,
SPARK-46466, SPARK-40876, and SPARK-48603
- Apache Jira: SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, SPARK-48603
- CDPD-70233: Rebase CDP 7.3.x Spark3 on Apache Spark
3.5.4
- 7.3.1.200 SP1
- Upgrading Spark from 3.4.1 to 3.5.4. For more
information, refer to Migrating
Spark applications.
Cloudera Runtime 7.3.1.100 CHF 1
- CDPD-76229: Optimize the processing speed of
BinaryArithmetic#dataType
when processing multi-column
data
- 7.3.1.100 CHF1
-
Restoring performance of some queries in Spark 3.4.1 to match other
versions (3.3.x, 3.5.x) of Spark.
Optimized the processing speed of
BinaryArithmetic#dataType
when processing
multi-column data.
- Apache Jira: SPARK-45071
- CDPD-75926: Backport SPARK-44653
- 7.3.1.100 CHF1
- Backported SPARK-44653 to fix cache breaking with
non-trivial DataFrame unions.
- Apache Jira: SPARK-44653
- CDPD-75755:
[ENCODER_NOT_FOUND] Not found an
encoder of the type T
to Spark SQL internal representation when
using Parameterized Bean
- 7.3.1.100 CHF1
- Fixed an upstream regression causing Encoder Exception
for a parameterized class
- Apache Jira: SPARK-46679
- CDPD-75622: Backport upstream fixes for handling
nested beans and generic type beans while creating Spark encoders.
- 7.3.1.100 CHF1
-
Backporting upstream fixes from Spark 3.4 to fix the following
issues:
- Starting from Spark 3.4.x,
Encoders.bean
raised an
exception when the passed class contains a field whose type is a
nested bean with type arguments
- From Spark 3.4.x, an exception is raised when
Encoders.bean
is called providing a bean having
read-only properties
- Unsupported feature of bean encoder when the superclass of the bean
has generic type arguments
- Apache Jira: APACHE-44634, APACHE-45081, APACHE-44910
- CDPD-75353:
CHAR
and
VARCHAR
handling in Spark 3 is incompatible with Spark
2
- 7.3.1.100 CHF1
-
Adding a new configuration
spark.cloudera.legacy.charVarcharLegacyPadding
(by
default set to false
in Spark 3). When set to
true
(together with
spark.sql.legacy.charVarcharAsString=true
) it
creates compatibility with Spark 2 behavior.
For more information refer to Migrating
Spark applications.
- CDPD-75286: Spark History UI -
StreamConstraintsException: String length exceeds the maximum length
- 7.3.1.100 CHF1
- Fixing an issue with Jackson to allow unlimited json
string length in Spark event logs.
- CDPD-59617: Spark - Upgrade Okio to 1.17.6 due to
CVE-2023-3635
- 7.3.1.100 CHF1
- Updating
okio
from version 1.15.0 to
1.17.6 to address the security vulnerability CVE-2023-3635.
- CDPD-74730: Backport SPARK-46239: Hide the Jetty
server's version
- 7.3.1.100 CHF1
- The Jetty server's version is now hidden.
- Apache Jira: SPARK-46239
- CDPD-73233: Encoder not found of the type T to Spark
SQL internal representation
- 7.3.1.100 CHF1
- Fixing an upstream regression of encoder exception
(
org.apache.spark.SparkUnsupportedOperationException:
[ENCODER_NOT_FOUND]
) for generic types.
- Apache Jira: SPARK-49789
Cloudera Runtime 7.3.1
- CDPD-74697 - Spark Iceberg vectorized Parquet read of decimal column is incorrect
- 7.3.1
- CDPD-72774 - Use common versions of commons-dbcp2 and commons-pool2
- 7.3.1
- CDPD-70114 - Redirect spark-submit, spark-shell etc. scripts to their Spark 3 counterparts
- 7.3.1
- CDPD-58844 - Spark - Upgrade Janino to 3.1.10 due to CVE-2023-33546
- 7.3.1
- CDPD-48171 - Spark3 - Upgrade snakeyaml due to CVE-2022-1471
- 7.3.1