Known Issues in Apache Spark
This topic describes known issues and workarounds for using Spark in this release of Cloudera Runtime.
- CDPD-22670 and CDPD-23103: There are two configurations in Spark, "Atlas dependency" and "spark_lineage_enabled", which are conflicted. The issue is when Atlas dependency is turned off but spark_lineage_enabled is turned on.
- Run Spark application, Spark will log some error message and cannot continue. That can be restored by correcting the configurations and restarting Spark component with distributing client configurations.
- CDPD-217: HBase/Spark connectors are not supported
- The Spark HBase Connector (SHC) from HDP and the hbase-spark module from CDH are not supported.
- CDPD-3038: Launching
pyspark
displays several HiveConf warning messages - When
pyspark
starts, several Hive configuration warning messages are displayed, similar to the following:19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist 19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
- CDPD-2650: Spark cannot write ZSTD and LZ4 compressed Parquet to dynamically partitioned tables
- Workaround: Use a different compression algorithm.
- CDPD-3293: Cannot create views (CREATE VIEW statement) from Spark
- Apache Ranger in CDP disallows Spark users from running
CREATE VIEW
statements.
- CDPD-3783: Cannot create databases from Spark
- Attempting to create a database using Spark results in an error
similar to the
following:
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Permission denied: user [sparkuser] does not have [ALL] privilege on [hdfs://ip-10-1-2-3.cloudera.site:8020/tmp/spark/warehouse/spark_database.db]);