Known Issues in Apache Spark

Learn about the known issues in Spark, the impact or changes to the functionality, and the workaround.

CDPD-217: The Apache Spark connector is not supported
The old Apache Spark - Apache HBase Connector (shc) is not supported in CDP releases.
Use the new HBase-Spark connector shipped in CDP release.
CDPD-3038: Launching pyspark displays several HiveConf warning messages
When pyspark starts, several Hive configuration warning messages are displayed, similar to the following:
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
These errors can be safely ignored.
TSB 2025-797: Using Spark vectorized reads on Iceberg tables with Parquet files could result in malformed data
Due to an upstream Apache Iceberg issue, in specific situations, customers executing Spark vectorized reads on Iceberg tables with Parquet files may encounter malformed output and data incorrectness.
For the latest update on this issue see the corresponding Knowledge article: TSB 2025-797: Using Spark vectorized reads on Iceberg tables with Parquet files could result in malformed data