Known Issues in Apache Spark

Learn about the known issues in Spark, the impact or changes to the functionality, and the workaround.

CDPD-30637: Spark Error: ClassNotFoundException: org.apache.hadoop.hive.llap.io.api.LlapProxy
Execute the following command: sudo ln -s /opt/cloudera/parcels/CDH/jars/hive-llap-client-3.1.3000.7.2.12.0-291.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/. The link to the jar must be created on all gateway and nodemanager nodes.
CDPD-217: HBase/Spark connectors are not supported
The Apache HBase Spark Connector (hbase-connectors/spark) and the Apache Spark - Apache HBase Connector (shc) are not supported in the initial CDP release.
None
CDPD-3038: Launching pyspark displays several HiveConf warning messages
When pyspark starts, several Hive configuration warning messages are displayed, similar to the following:
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
These errors can be safely ignored.
CDPD-2650: Spark cannot write ZSTD and LZ4 compressed Parquet to dynamically partitioned tables
Use a different compression algorithm.
CDPD-3783: Cannot create databases from Spark
Attempting to create a database using Spark results in an error similar to the following:
org.apache.spark.sql.AnalysisException:
            org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Permission denied: user [sparkuser] does not have [ALL] privilege on [hdfs://ip-10-1-2-3.cloudera.site:8020/tmp/spark/warehouse/spark_database.db]);
Create the database using Hive or Impala, or specify the external data warehouse location in the create command. For example:
sql("create database spark_database location '/warehouse/tablespace/external/hive/spark_database.db'")