Known Issues in Apache Spark

This topic describes known issues and workarounds for using Spark in this release of Cloudera Runtime.

CDPD-217: HBase/Spark connectors are not supported
The Spark HBase Connector (SHC) from HDP and the hbase-spark module from CDH are not supported.
Workaround: Migrate to the Apache HBase Connectors integration for Apache Spark (hbase-connectors/spark) available in CDP. More details on the integration for working with HBase data from Spark in CDP is available in the Cloudera Community article, HBase and Spark in CDP.
CDPD-3038: Launching pyspark displays several HiveConf warning messages
When pyspark starts, several Hive configuration warning messages are displayed, similar to the following:
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
Workaround: These errors can be safely ignored.
CDPD-2650: Spark cannot write ZSTD and LZ4 compressed Parquet to dynamically partitioned tables
Workaround: Use a different compression algorithm.
CDPD-3293: Cannot create views (CREATE VIEW statement) from Spark
Apache Ranger in CDP disallows Spark users from running CREATE VIEW statements.
Workaround: Create the view using Hive or Impala.