Fixed Issues in Apache Spark

Review the list of Spark issues that are resolved in Cloudera Runtime 7.1.7.

CDPD-2650: Spark can't write ZSTD and LZ4 compressed Parquet to dynamically partitioned table.
This issue is resolved.
CDPD-3783: Unable to create database in spark.
This issue is resolved.
CDPD-21614: In order to retain the legacy Hive1/Hive2 behavior around managed non-acid tables, the migration process instructed to convert those tables to external with "external
However, in 7.1.4, the TRUNCATE TABLE operation cannot be performed via Spark SQL on those tables. From 7.1.6 onwards, Spark now allows you to TRUNCATE an external table if "external.table.purge" is set to "true" in table properties.
YARN aggregation job is missing YARN metric folders because of timezone issues
In this release, the Spark Atlas Connector produces a spark_application entity for each Spark job. Each data flow produced by the job creates a spark_process entity in Atlas, which tracks the actual input and output data sets for that process.
CDPD-18458: [SPARK-32635] When pyspark
When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result. This issue is resolved.

Apache patch information

  • SPARK-32635