Spark

« Prev

	Important
	Hortonworks strongly recommends that all users running HDP 2.3.4 upgrade to HDP 2.3.4.7.

HDP 2.3.4 provides Spark 1.5.2 and the following Apache patches:

SPARK-10058: CORE, TESTS, Fix the flaky tests in HeartbeatReceiverSuite.
SPARK-10389: SQL, [1.5 support order by non-attribute grouping expression on Aggregate.
SPARK-10515: When killing executor, the pending replacement executors should not be lost.
SPARK-10534: SQL, ORDER BY clause allows only columns that are present in the select projection list.
SPARK-10577: PYSPARK, DataFrame hint for broadcast join.
SPARK-10581: DOCS, Groups are not resolved in scaladoc in SQL classes.
SPARK-10619: Can't sort columns on Executor Page.
SPARK-10741: SQL, Hive Query Having/OrderBy against Parquet table is not working
SPARK-10790: YARN, Fix initial executor number not set issue and consolidate the codes.
SPARK-10812: YARN, Fix shutdown of token renewer..
SPARK-10812: YARN, Spark hadoop util support switching to yarn.
SPARK-10825: CORE, TESTS, Fix race conditions in StandaloneDynamicAllocationSuite.
SPARK-10829: SQL, Fix 2 bugs for filter on partitioned columns.
SPARK-10833: BUILD, Inline, organize BSD/MIT licenses in LICENSE.
SPARK-10845: SQL, Makes spark.sql.hive.version a SQLConfEntry.
SPARK-10858: YARN: archives/jar/files rename with # doesn't work unl.
SPARK-10859: SQL, fix stats of StringType in columnar cache.
SPARK-10871: include number of executor failures in error msg.
SPARK-10885: STREAMING, Display the failed output op in Streaming UI.
SPARK-10889: STREAMING, Bump KCL to add MillisBehindLatest metric.
SPARK-10901: YARN, spark.yarn.user.classpath.first doesn't work.
SPARK-10904: SPARKR, Fix to support `select(df, c("col1", "col2"))`.
SPARK-10914: UnsafeRow serialization breaks when two machines have different Oops size..
SPARK-10932: PROJECT INFRA, Port two minor changes to release-build.sh from scripts' old repo.
SPARK-10934: SQL, handle hashCode of unsafe array correctly.
SPARK-10952: Only add hive to classpath if HIVE_HOME is set..
SPARK-10955: STREAMING, Add a warning if dynamic allocation for Streaming applications.
SPARK-10959: PYSPARK, StreamingLogisticRegressionWithSGD does not t….
SPARK-10959: PYSPARK, StreamingLogisticRegressionWithSGD does not train with given regParam and convergenceTol parameters.
SPARK-10960: SQL, SQL with windowing function should be able to refer column in inner select.
SPARK-10971: SPARKR, RRunner should allow setting path to Rscript..
SPARK-10973: ML, PYTHON, Fix IndexError exception on SparseVector when asked for index after the last non-zero entry.
SPARK-10980: SQL, fix bug in create Decimal.
SPARK-10981: SPARKR, SparkR Join improvements.
SPARK-11009: SQL, fix wrong result of Window function in cluster mode.
SPARK-11023: YARN, Avoid creating URIs from local paths directly..
SPARK-11026: YARN, spark.yarn.user.classpath.first does work for 'SPARK-submit --jars hdfs://user/foo.jar'.
SPARK-11032: SQL, correctly handle having.
SPARK-11039: DOCS, WEBUI, Document additional UI configurations.
SPARK-11047: Internal accumulators miss the internal flag when replaying events in the history server.
SPARK-11051: CORE, Do not allow local checkpointing after the RDD is materialized and checkpointed.
SPARK-11056: Improve documentation of SBT build..
SPARK-11063: STREAMING, Change preferredLocations of Receiver's RDD to hosts rather than hostports.
SPARK-11066: Update DAGScheduler's "misbehaved ResultHandler".
SPARK-11094: Strip extra strings from Java version in test runner.
SPARK-11103: SQL, Filter applied on Merged Parquet schema with new column fail.
SPARK-11104: STREAMING, Fix a deadlock in StreamingContex.stop.
SPARK-11126: SQL, Fix a memory leak in SQLListener._stageIdToStageMetrics.
SPARK-11126: SQL, Fix the potential flaky test.
SPARK-11135: SQL, Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering.
SPARK-11153: SQL, Disables Parquet filter push-down for string and binary columns.
SPARK-11188: SQL, Elide stacktraces in bin/SPARK-sql for AnalysisExceptions.
SPARK-11233: SQL, register cosh in function registry.
SPARK-11244: SPARKR, sparkR.stop() should remove SQLContext.
SPARK-11246: SQL, Table cache for Parquet broken in 1.5.
SPARK-11251: Fix page size calculation in local mode.
SPARK-11264: bin/SPARK-class can't find assembly jars with certain GREP_OPTIONS set.
SPARK-11270: STREAMING, Add improved equality testing for TopicAndPartition from the Kafka Streaming API.
SPARK-11287: Fixed class name to properly start TestExecutor from deploy.client.TestClient.
SPARK-11294: SPARKR, Improve R doc for read.df, write.df, saveAsTable.
SPARK-11299: DOC, Fix link to Scala DataFrame Functions reference.
SPARK-11302: MLLIB, 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases.
SPARK-11303: SQL, filter should not be pushed down into sample.
SPARK-11417: SQL, no @Override in codegen.
SPARK-11424: Guard against double-close() of RecordReaders.
SPARK-11434: SQL, Fix test "Filter applied on merged Parquet schema with new column fails".
SPARK-5966: WIP, SPARK-submit deploy-mode cluster is not compatible with master local>.
SPARK-8386: SQL, add write.mode for insertIntoJDBC when the parameter overwrite is false.

HDP 2.3.2 provided Spark 1.4.1 and the following Apache patches:

NEW FEATURES

SPARK-1537 Add integration with Yarn's Application Timeline Server.
SPARK-6112 Provide external block store support through HDFS RAM_DISK.

BUG FIXES

SPARK-10623 NoSuchElementException thrown when ORC predicate push-down is turned on.

HDP 2.3.0 provided Spark 1.3.1 and the following Apache patches:

IMPROVEMENTS

SPARK-7326 (Backport) Performing window() on a WindowedDStream doesn't work all the time JDK 1.7 repackaging

​Spark