Spark
This release provides Spark 2.4.4 and with no additional Apache patches. For a list of fixed issues, see the HDInsight Fixed Issues section.
HDP 3.1.3 for HDInsight provided Spark 2.4.0 and the following Apache patches.
-
SPARK-21783: Turn on `native` ORC impl and PPD by default
-
SPARK-23456: Turn on `native` ORC impl and PPD by default
-
SPARK-23228: Add Python Created jsparkSession to JVM's defaultSession
-
SPARK-23510: Support Hive 2.2 and Hive 2.3 metastore
-
SPARK-23518: Avoid metastore access when the users only want to read and write data frames
-
SPARK-23635: Spark executor env variable is overwritten by same name AM env variable
-
SPARK-23787: Fix file download test in SparkSubmitSuite for Hadoop 2.9.
-
SPARK-23355: convertMetastore should not ignore table properties
-
SPARK-24110: Avoid UGI.loginUserFromKeytab in STS
-
SPARK-24149: Retrieve all federated namespaces tokens
-
SPARK-24209: Automatic retrieve proxyBase from Knox headers
-
SPARK-24312: Upgrade to 2.3.3 for Hive Metastore Client 2.3
-
SPARK-24377: make --py-files work in non pyspark application
-
SPARK-24479: Added config for registering streamingQueryListeners
-
SPARK-24518: Using Hadoop credential provider API to store password
-
SPARK-24660: Show correct error pages when downloading logs in SHS
-
SPARK-23654: remove jets3t as a dependency of spark
-
SPARK-25126: Avoid creating Reader for all orc files
-
SPARK-23679: Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode
-
SPARK-25306: Avoid skewed filter trees to speed up `createFilter` in ORC
HDP 3.0.0 and HDP 3.0.1 provided Spark 2.3.1 and the following Apache patches.
- SPARK-24495: SortMergeJoin with duplicate keys wrong results.
- SPARK-207: Remove hardcode FS scheme from Spark archive.
(Backport from 2.3.2)
-
SPARK-24455: fix typo in TaskSchedulerImpl comment.
-
SPARK-24369: Correct handling for multiple distinct aggregations having the same argument set.
-
SPARK-24468: Handle negative scale when adjusting precision for decimal operations.
-
SPARK-23732: Fix source links in generated scaladoc.
-
SPARK-24502: flaky test: UnsafeRowSerializerSuite.
-
SPARK-24531: Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite.
-
SPARK-24506: Add UI filters to tabs added after binding.
-
SPARK-23754: Move UDF stop iteration wrapping from driver to executor.
-
Remove unnecessary sort in UnsafeInMemorySorterSuite.
-
Fix typo in serializer exception.
-
Revert SPARK-21743 top-most limit should not cause memory leak.
-
SPARK-24531: Replace 2.3.0 version with 2.3.1.
(Backport from 2.4)
-
SPARK-21783: Turn on `native` ORC impl and PPD by default.
-
SPARK-23456: Turn on `native` ORC impl and PPD by default.
-
SPARK-23228: Add Python Created jsparkSession to JVM's defaultSession.
-
SPARK-23510: Support Hive 2.2 and Hive 2.3 metastore.
-
SPARK-23518: Avoid metastore access when the users only want to read and write data frames.
-
SPARK-23635: Spark executor env variable is overwritten by same name AM env variable.
-
SPARK-23787: Fix file download test in SparkSubmitSuite for Hadoop 2.9..
-
SPARK-23355 convertMetastore should not ignore table properties.
-
SPARK-24110 Avoid UGI.loginUserFromKeytab in STS.
-
SPARK-24149: Retrieve all federated namespaces tokens.
-
SPARK-24209: Automatic retrieve proxyBase from Knox headers.
-
SPARK-24312: Upgrade to 2.3.3 for Hive Metastore Client 2.3.
-
SPARK-24377: make --py-files work in non pyspark application.
-
SPARK-24479: Added config for registering streamingQueryListeners.
-
SPARK-24518: Using Hadoop credential provider API to store password.
-
SPARK-24660: Show correct error pages when downloading logs in SHS.