Spark

HDP 2.3.4.7 provides Spark 1.5.2 with the patches specified below. No new additional Apache patches have been included in this release.

HDP 2.3.4 provided Spark 1.5.2 and the following Apache patches:

SPARK-10058: CORE, TESTS, Fix the flaky tests in HeartbeatReceiverSuite.
SPARK-10389: SQL, [1.5 support order by non-attribute grouping expression on Aggregate.
SPARK-10515: When killing executor, the pending replacement executors should not be lost.
SPARK-10534: SQL, ORDER BY clause allows only columns that are present in the select projection list.
SPARK-10577: PYSPARK, DataFrame hint for broadcast join.
SPARK-10581: DOCS, Groups are not resolved in scaladoc in SQL classes.
SPARK-10619: Can't sort columns on Executor Page.
SPARK-10741: SQL, Hive Query Having/OrderBy against Parquet table is not working
SPARK-10790: YARN, Fix initial executor number not set issue and consolidate the codes.
SPARK-10812: YARN, Fix shutdown of token renewer..
SPARK-10812: YARN, Spark hadoop util support switching to yarn.
SPARK-10825: CORE, TESTS, Fix race conditions in StandaloneDynamicAllocationSuite.
SPARK-10829: SQL, Fix 2 bugs for filter on partitioned columns.
SPARK-10833: BUILD, Inline, organize BSD/MIT licenses in LICENSE.
SPARK-10845: SQL, Makes spark.sql.hive.version a SQLConfEntry.
SPARK-10858: YARN: archives/jar/files rename with # doesn't work unl.
SPARK-10859: SQL, fix stats of StringType in columnar cache.
SPARK-10871: include number of executor failures in error msg.
SPARK-10885: STREAMING, Display the failed output op in Streaming UI.
SPARK-10889: STREAMING, Bump KCL to add MillisBehindLatest metric.
SPARK-10901: YARN, spark.yarn.user.classpath.first doesn't work.
SPARK-10904: SPARKR, Fix to support `select(df, c("col1", "col2"))`.
SPARK-10914: UnsafeRow serialization breaks when two machines have different Oops size..
SPARK-10932: PROJECT INFRA, Port two minor changes to release-build.sh from scripts' old repo.
SPARK-10934: SQL, handle hashCode of unsafe array correctly.
SPARK-10952: Only add hive to classpath if HIVE_HOME is set..
SPARK-10955: STREAMING, Add a warning if dynamic allocation for Streaming applications.
SPARK-10959: PYSPARK, StreamingLogisticRegressionWithSGD does not t….
SPARK-10959: PYSPARK, StreamingLogisticRegressionWithSGD does not train with given regParam and convergenceTol parameters.
SPARK-10960: SQL, SQL with windowing function should be able to refer column in inner select.
SPARK-10971: SPARKR, RRunner should allow setting path to Rscript..
SPARK-10973: ML, PYTHON, Fix IndexError exception on SparseVector when asked for index after the last non-zero entry.
SPARK-10980: SQL, fix bug in create Decimal.
SPARK-10981: SPARKR, SparkR Join improvements.
SPARK-11009: SQL, fix wrong result of Window function in cluster mode.
SPARK-11023: YARN, Avoid creating URIs from local paths directly..
SPARK-11026: YARN, spark.yarn.user.classpath.first does work for 'SPARK-submit --jars hdfs://user/foo.jar'.
SPARK-11032: SQL, correctly handle having.
SPARK-11039: DOCS, WEBUI, Document additional UI configurations.
SPARK-11047: Internal accumulators miss the internal flag when replaying events in the history server.
SPARK-11051: CORE, Do not allow local checkpointing after the RDD is materialized and checkpointed.
SPARK-11056: Improve documentation of SBT build..
SPARK-11063: STREAMING, Change preferredLocations of Receiver's RDD to hosts rather than hostports.
SPARK-11066: Update DAGScheduler's "misbehaved ResultHandler".
SPARK-11094: Strip extra strings from Java version in test runner.
SPARK-11103: SQL, Filter applied on Merged Parquet schema with new column fail.
SPARK-11104: STREAMING, Fix a deadlock in StreamingContex.stop.
SPARK-11126: SQL, Fix a memory leak in SQLListener._stageIdToStageMetrics.
SPARK-11126: SQL, Fix the potential flaky test.
SPARK-11135: SQL, Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering.
SPARK-11153: SQL, Disables Parquet filter push-down for string and binary columns.
SPARK-11188: SQL, Elide stacktraces in bin/SPARK-sql for AnalysisExceptions.
SPARK-11233: SQL, register cosh in function registry.
SPARK-11244: SPARKR, sparkR.stop() should remove SQLContext.
SPARK-11246: SQL, Table cache for Parquet broken in 1.5.
SPARK-11251: Fix page size calculation in local mode.
SPARK-11264: bin/SPARK-class can't find assembly jars with certain GREP_OPTIONS set.
SPARK-11270: STREAMING, Add improved equality testing for TopicAndPartition from the Kafka Streaming API.
SPARK-11287: Fixed class name to properly start TestExecutor from deploy.client.TestClient.
SPARK-11294: SPARKR, Improve R doc for read.df, write.df, saveAsTable.
SPARK-11299: DOC, Fix link to Scala DataFrame Functions reference.
SPARK-11302: MLLIB, 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases.
SPARK-11303: SQL, filter should not be pushed down into sample.
SPARK-11417: SQL, no @Override in codegen.
SPARK-11424: Guard against double-close() of RecordReaders.
SPARK-11434: SQL, Fix test "Filter applied on merged Parquet schema with new column fails".
SPARK-5966: WIP, SPARK-submit deploy-mode cluster is not compatible with master local>.
SPARK-8386: SQL, add write.mode for insertIntoJDBC when the parameter overwrite is false.

HDP 2.3.2 provided Spark 1.4.1 and the following Apache patches:

NEW FEATURES

SPARK-1537 Add integration with Yarn's Application Timeline Server.
SPARK-6112 Provide external block store support through HDFS RAM_DISK.

BUG FIXES

SPARK-10623 NoSuchElementException thrown when ORC predicate push-down is turned on.

HDP 2.3.0 provided Spark 1.3.1 and the following Apache patches:

IMPROVEMENTS

SPARK-7326 (Backport) Performing window() on a WindowedDStream doesn't work all the time JDK 1.7 repackaging

​Spark

Spark