Known Issues in Apache Spark

Learn about the known issues in Apache Spark, the impact or changes to the functionality, and the workaround.

Known Issues identified in Cloudera Runtime 7.3.1.400 SP2

There are no new known issues identified in this release.

Known Issues identified in Cloudera Runtime 7.3.1.300 SP1 CHF1

There are no new known issues identified in this release.

Known Issues identified in Cloudera Runtime 7.3.1.200 SP1

There are no new known issues identified in this release.

Known Issues identified in Cloudera Runtime 7.3.1.100 CHF1

The following section lists the known issues identified in this release:

CDPD-80239: Non-deterministic SQL expressions should set indeterminate map stage output level
7.3.1, 7.3.1.100 CFH1, 7.3.1.200 SP1, 7.3.1.300 SP1 CHF1, 7.3.1.400 SP2
Spark is supposed to handle non-deterministic keys, as long as they are marked with deterministic=false in their data type attributes. For Spark's random data this contract is not honored when there is a task failure. As a result, duplicate or missing data can be produced when the Spark executors are relaunched in new node managers.
Use the client configuration spark.global.deterministic to override any input-level deterministic configuration. If set to true, all inputs are deterministic, if set to false all inputs are indeterministic.

Known Issues identified in Cloudera Runtime 7.3.1

The following section lists the known issues identified in this release:

Spark 3: RAPIDS Accelerator is not available
7.3.1, 7.3.1.100 CHF1, 7.3.1.200 SP1, 7.3.1.300 SP1 CHF1, 7.3.1.400 SP2
The RAPIDS Accelerator for Apache Spark is currently not available in Cloudera Runtime7.3.1
None.
The CHAR(n) type handled inconsistently, depending on whether the table is partitioned or not.
7.3.1
7.3.1.100 CHF1
In upstream Spark 3 the spark.sql.legacy.charVarcharAsString configuration was introduced, but it does not solve all incompatibilities with Spark 2.

None. A new configuration spark.cloudera.legacy.charVarcharLegacyPadding will be introduced in a future version to keep compatibility with Spark 2, but it isn't available in 7.3.1.

Apache Jira: SPARK-33480