This section consists of known limitations in Apache Spark we are aware of persisting in Cloudera Runtime 7.3.1 release, its SPs and CHFs, the impact or changes
to the functionality, and the workaround.
Known Issues identified in Cloudera Runtime 7.3.1.300 SP1 CHF1
There are no new known issues identified for Spark in this release.
Known Issues identified in Cloudera Runtime 7.3.1.200 SP1
There are no new known issues identified for Spark in this release.
Known Issues identified in Cloudera Runtime 7.3.1.100 CHF1
The following section lists the known issues identified in this release:
CDPD-80239: Non-deterministic SQL expressions should set indeterminate map stage output level
Spark is supposed to handle non-deterministic keys, as long as they are marked with deterministic=false in their data type attributes. For Spark's random data this contract is not honored when there is a task failure. As a result, duplicate or missing data can be produced when the Spark executors are relaunched in new node managers.
Use the client configuration spark.global.deterministic to override any input-level deterministic configuration. If set to true, all inputs are deterministic, if set to false all inputs are indeterministic.
Known Issues identified in Cloudera Runtime 7.3.1
The following section lists the known issues identified in this release:
The RAPIDS Accelerator for Apache Spark is currently not available in Cloudera Runtime7.3.1
None.
The CHAR(n) type handled inconsistently, depending on whether the table is partitioned or not.
7.3.1
7.3.1.100 CHF1
In upstream Spark 3 the spark.sql.legacy.charVarcharAsString configuration was introduced, but it does not solve all incompatibilities with Spark 2.
None. A new configuration spark.cloudera.legacy.charVarcharLegacyPadding will be introduced in a future version to keep compatibility with Spark 2, but it isn't available in 7.3.1.