Fixed Issues in Spark
Review the list of Spark issues that are resolved in Cloudera Runtime 7.2.18.
- CDPD-3038: Launching
pyspark
displays several HiveConf warning messages - When
pyspark
starts, several Hive configuration warning messages are displayed, similar to the following:19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist 19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
- CDPD-65717: SPARK-46793 Revert S3A endpoint fixup logic of SPARK-35878
- SPARK-46793. Revert S3A endpoint fixup logic of SPARK-35878
- CDPD-64638: Slowness / broadcast timeout issues due to SPARK-33290: REFRESH TABLE should invalidate cache even though the table itself may not be cached (Spark 2.4.8)
- Slowness / broadcast timeout issues could occur due to SPARK-33290 in case of Spark 2.4.8. A new legacy spark.sql.legacy.refreshOnlyCachedTables feature flag has been introduced to restore the behavior prior to Spark 2.4.8. If spark.sql.legacy.refreshOnlyCachedTables is set to false (default), REFRESH TABLE should invalidate cache even though the table itself may not be cached, this was introduced with SPARK-33290 in Spark 2.4.8. When set to true, restore the behavior prior to Spark 2.4.8. I have manually tested with customer data which caused timeout / slowness issues.
- CDPD-64546: Performance: Spark TPCDS Queries are slower in 7.2.18 compared to 7.2.17
- Fixed with disabling checksum on the client side while reading data. The read performance is similar as earlier showing no regressions.
- CDPD-61564: Spark - Caused by: java.lang.NoClassDefFoundError: org/datanucleus/store/query/cache/QueryCompilationCache
- Upgraded datanucleus-core dependency to 5.2.10
- CDPD-57535: Revert: CDPD-48171: Temporary workaround pinning snakeyaml to 2.0 not vulnerable to CVE-2022-1471
- Reverted back from snakeyaml 2.0. The snakeyaml's Representer constructor has been added back. The other reverted constructors can be found here: https://bitbucket.org/snakeyaml/snakeyaml/commits/3e755d254aeaa902675053047fd53368a175565a/raw
- CDPD-58558: Simple DML insert into table via spark3-shell sparks.sql is creating orphan spark_process in atlas
- Does not create spark_process entity in case of INSERT INTO ... VALUES ... Only the INSERT INTO ... SELECT ... action may create spark_process entity in Atlas based on these official documentations: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/atlas-reference/topics/atlas-spark-actions.html https://docs.cloudera.com/runtime/7.2.17/atlas-reference/topics/atlas-spark-actions.html
- CDPD-58191: Spark - Upgrade kubernetes library to 5.7.4/5.8.1/5.10.2/5.11.2+ due to CVE-2021-4178
- Upgraded kubernetes-client dependency to 5.7.4
- CDPD-58080: Backport SPARK-32951 to Spark 2
- SPARK-32951 Foldable propagation from Aggregate
- CDPD-56594: Lineage (spark_process) is not created for views created on iceberg tables
- Added CREATE VIEW lineage support to Spark Atlas Connector for Spark3 which is required for Iceberg tables
- CDPD-56342: Upgrade Parquet to 1.12.3 in Spark
- Upgraded Parquet dependency to 1.12.3
- CDPD-55243: Fix case sensitivity of Iceberg's CachingCatalog
- Previously, using inconsistent casing for database and table names of Iceberg tables in queries can lead to Spark reading a stale cached snapshot after a write to the table (append, update, delete) in the same Spark session. Now the cache is insensitive to the case of database and table names and is always refreshed on a write in the session.
- CDPD-55116: Fix Spark vulnerability CVE-2023-22946
- This fix is blacklisting “spark.submit.deployMode” and “spark.submit.proxyUser.allowCustomClasspathInClusterMode” spark configurations in Livy create session REST API. We have added a new Livy configuration “livy.server.session.allow-custom-classpath” to allow custom class path. In order to disable or rollback this fix, we can add “livy.server.session.allow-custom-classpath” as “true” in Livy configuration via the CM safety valve.
- CDPD-44454: MAPREDUCE-7432. Make manifest committer default on abfs and gcs stores
- MAPREDUCE-7432. Make manifest committer default on abfs and gcs stores
- CDPD-44227: Ranger improvement - Roles Import/export API for ranger admin
- Add Roles Import/export API for ranger admin
Apache patch information
- SPARK-46793
- SPARK-39441
- SPARK-32951
- LIVY-975
- MAPREDUCE-7432