Known Issues in Apache Spark
Learn about the known issues in Spark, the impact or changes to the functionality, and the workaround.
- CDPD-23817: In the upgraded Cluster, the permission of /tmp/spark is restricted due to the HDP configuration hive.exec.scratchdir=/tmp/spark.
- If you are using the /tmp/spark directory in the CDP cluster, you must provide the required additional Policies/ACL permissions.
- CDPD-22670 and CDPD-23103: There are two configurations in Spark, "Atlas dependency" and "spark_lineage_enabled", which are conflicted. The issue is when Atlas dependency is turned off but spark_lineage_enabled is turned on.
- Run Spark application, Spark will log some error message and cannot continue. That can be restored by correcting the configurations and restarting Spark component with distributing client configurations.
- CDPD-23007: Mismatch in the Spark Default DB Location. In HDP 3.1.5, hive_db entities have one attribute - 'location' which is configured to the '/managed' path. In fresh install of CDP 7.1.7, hive_db entities now have 2 attributes 'location' configured to '/external' path and 'managedLocation' configured to '/managed' path. In. the AM2CM migration (HDP 3.1.5 -> CDP 7.1.7), the 'location' attribute from hive_db entities in HDP 3.1.5 comes unaltered to CDP 7.1.7 and hence maps to '/managed' path.
- This issue arises only if you are upgrading from HDP 3.1.5 to CDP 7.1.7. If you are performing a fresh install of CDP 7.1.7, you can ignore this issue.
- CDPD-217: HBase/Spark connectors are not supported
- The Apache HBase Spark Connector
hbase-connectors/spark) and the Apache Spark - Apache HBase Connector (
shc) are not supported in the initial CDP release.
- CDPD-3038: Launching
pysparkdisplays several HiveConf warning messages
pysparkstarts, several Hive configuration warning messages are displayed, similar to the following:
19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist 19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
- These errors can be safely ignored.
- CDPD-2650: Spark cannot write ZSTD and LZ4 compressed Parquet to dynamically partitioned tables
- Use a different compression algorithm.
- CDPD-3783: Cannot create databases from Spark
- Attempting to create a database using Spark results in an error
similar to the
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Permission denied: user [sparkuser] does not have [ALL] privilege on [hdfs://ip-10-1-2-3.cloudera.site:8020/tmp/spark/warehouse/spark_database.db]);
- Create the database using Hive or Impala, or
specify the external data warehouse location in the
createcommand. For example:
sql("create database spark_database location '/warehouse/tablespace/external/hive/spark_database.db'")