Hortonworks Data Platform for HDInsight
Also available as:
PDF

Spark

This release provides Spark 2.4.4 and with no additional Apache patches. For a list of fixed issues, see the HDInsight Fixed Issues section.

HDP 3.1.3 for HDInsight provided Spark 2.4.0 and the following Apache patches.

  • SPARK-21783: Turn on `native` ORC impl and PPD by default

  • SPARK-23456: Turn on `native` ORC impl and PPD by default

  • SPARK-23228: Add Python Created jsparkSession to JVM's defaultSession

  • SPARK-23510: Support Hive 2.2 and Hive 2.3 metastore

  • SPARK-23518: Avoid metastore access when the users only want to read and write data frames

  • SPARK-23635: Spark executor env variable is overwritten by same name AM env variable

  • SPARK-23787: Fix file download test in SparkSubmitSuite for Hadoop 2.9.

  • SPARK-23355: convertMetastore should not ignore table properties

  • SPARK-24110: Avoid UGI.loginUserFromKeytab in STS

  • SPARK-24149: Retrieve all federated namespaces tokens

  • SPARK-24209: Automatic retrieve proxyBase from Knox headers

  • SPARK-24312: Upgrade to 2.3.3 for Hive Metastore Client 2.3

  • SPARK-24377: make --py-files work in non pyspark application

  • SPARK-24479: Added config for registering streamingQueryListeners

  • SPARK-24518: Using Hadoop credential provider API to store password

  • SPARK-24660: Show correct error pages when downloading logs in SHS

  • SPARK-23654: remove jets3t as a dependency of spark

  • SPARK-25126: Avoid creating Reader for all orc files

  • SPARK-23679: Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode

  • SPARK-25306: Avoid skewed filter trees to speed up `createFilter` in ORC

HDP 3.0.0 and HDP 3.0.1 provided Spark 2.3.1 and the following Apache patches.

  • SPARK-24495: SortMergeJoin with duplicate keys wrong results.
  • SPARK-207: Remove hardcode FS scheme from Spark archive.

(Backport from 2.3.2)

  • SPARK-24455: fix typo in TaskSchedulerImpl comment.

  • SPARK-24369: Correct handling for multiple distinct aggregations having the same argument set.

  • SPARK-24468: Handle negative scale when adjusting precision for decimal operations.

  • SPARK-23732: Fix source links in generated scaladoc.

  • SPARK-24502: flaky test: UnsafeRowSerializerSuite.

  • SPARK-24531: Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite.

  • SPARK-24506: Add UI filters to tabs added after binding.

  • SPARK-23754: Move UDF stop iteration wrapping from driver to executor.

  • Remove unnecessary sort in UnsafeInMemorySorterSuite.

  • Fix typo in serializer exception.

  • Revert SPARK-21743 top-most limit should not cause memory leak.

  • SPARK-24531: Replace 2.3.0 version with 2.3.1.

(Backport from 2.4)

  • SPARK-21783: Turn on `native` ORC impl and PPD by default.

  • SPARK-23456: Turn on `native` ORC impl and PPD by default.

  • SPARK-23228: Add Python Created jsparkSession to JVM's defaultSession.

  • SPARK-23510: Support Hive 2.2 and Hive 2.3 metastore.

  • SPARK-23518: Avoid metastore access when the users only want to read and write data frames.

  • SPARK-23635: Spark executor env variable is overwritten by same name AM env variable.

  • SPARK-23787: Fix file download test in SparkSubmitSuite for Hadoop 2.9..

  • SPARK-23355 convertMetastore should not ignore table properties.

  • SPARK-24110 Avoid UGI.loginUserFromKeytab in STS.

  • SPARK-24149: Retrieve all federated namespaces tokens.

  • SPARK-24209: Automatic retrieve proxyBase from Knox headers.

  • SPARK-24312: Upgrade to 2.3.3 for Hive Metastore Client 2.3.

  • SPARK-24377: make --py-files work in non pyspark application.

  • SPARK-24479: Added config for registering streamingQueryListeners.

  • SPARK-24518: Using Hadoop credential provider API to store password.

  • SPARK-24660: Show correct error pages when downloading logs in SHS.