Known Issues in Spark

Review the list of known issues in Spark in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.

Known issues identified in Cloudera Runtime 7.3.2

CDPD-67806: Migration of ORC table with timestamp column to Iceberg with spark.sql.timestampType=TIMESTAMP_NTZ fails: 7.2.18, 7.3.1.x; The Hive External Catalog converts NTZ timestamp types back to TimeStampType, which Hive does not support, leading to migration failures.; Set spark.sql.iceberg.use-timestamp-without-timezone-in-new-tables and spark.sql.iceberg.handle-timestamp-without-timezone to true to migrate ORC tables to Iceberg with timestamp columns without timezone..
CDPD-90447: Table is stored in HDFS when cluster is enabled with raz and S3 is set to default storage system: 7.3.2; Tables created on a RAZ-enabled clusters with S3 set as the default storage system are stored in HDFS instead of the specified S3 bucket.; None. S3 was removed as the default storage.
OPSAPS-75684: Spark fails due to Zookeeper Custom Kerberos Principal issue: 7.3.2; Incorrect Zookeeper principal configuration and missing JVM property setup leads to SASL authentication failures.; When a custom Zookeeper principal is used, add the -Dzookeeper.sasl.client.username=[***USERNAME***] JVM argument to spark.*.defaultJavaOptions or spark.*.extraJavaOptions in spark-defaults.conf.

CDPD-95322: Missing Atlas lineage for Spark Iceberg tables from MERGE INTO: 7.3.2, 7.3.1.0 and higher; Spark SQL MERGE INTO statements on Iceberg tables are not transmitting lineage data to Atlas.; None.
CDPD-94393: RuntimeWarning Failed to add file" message appears even when Spark successfully loads files: 7.3.2, 7.3.1.0 and higher; In both Spark 2 and 3, due to an exception when attempting to add files to the Python path, the RuntimeWarning: Failed to add file message appears even when the Python JAR file is successfully loaded.; None. You can safely ignore the message as the file is loaded successfully and the message does not affect job completion.
Spark 3: RAPIDS Accelerator is not available: 7.3.2, 7.3.1.0 and higher; The RAPIDS Accelerator for Apache Spark is currently not available in Cloudera Runtime7.3.1; None.
The CHAR(n) type handled inconsistently, depending on whether the table is partitioned or not.: 7.3.1; 7.3.2, 7.3.1.100 CHF1 and higher; In upstream Spark 3 the spark.sql.legacy.charVarcharAsString configuration was introduced, but it does not solve all incompatibilities with Spark 2.; None. A new configuration spark.cloudera.legacy.charVarcharLegacyPadding will be introduced in a future version to keep compatibility with Spark 2, but it isn't available in 7.3.1.

note
The CHAR type is legacy in SQL, and using it is discouraged. Cloudera recommends using VARCHAR or STRING instead.; Apache Jira: SPARK-33480