Release Notes
Also available as:
PDF

Spark

This release provides Spark 1.6.3 with no additional Apache patches.

HDP 2.6.0 provides Spark 1.6.3 and the following Apache patches:

  • SPARK-6717: Clear shuffle files after checkpointing in ALS.

  • SPARK-6735: Add window based executor failure tracking mechanism for long running service.

  • SPARK-6847: Stack overflow on updateStateByKey which followed by a stream with checkpoint set.

  • SPARK-7481: Add spark-cloud module to pull in aws+azure object store FS accessors; test integration.

  • SPARK-7889: Jobs progress of apps on complete page of HistoryServer shows uncompleted.

  • SPARK-10582: using dynamic-executor-allocation, if AM failed, the new AM will be started. But the new AM does not allocate executors to driver.

  • SPARK-11137: Make StreamingContext.stop() exception-safe.

  • SPARK-11314: Add service API and test service for Yarn Cluster schedulers.

  • SPARK-11315: Add YARN extension service to publish Spark events to YARN timeline service (part of SPARK-1537).

  • SPARK-11323: Add History Service Provider to service application histories from YARN timeline server (part of SPARK-1537).

  • SPARK-11627: Spark Streaming backpressure mechanism has no initial rate limit, receivers receive data at the maximum speed , it might cause OOM exception.

  • SPARK-12001: StreamingContext cannot be completely stopped if the stop() is interrupted.

  • SPARK-12009: Avoid re-allocate yarn container while driver want to stop all executors.

  • SPARK-12142: Can't request executor when container allocator us bit ready.

  • SPARK-12241: Improve failure reporting in Yarn client obtainTokenForHBase().

  • SPARK-12353: wrong output for countByValue and countByValueAndWIndow.

  • SPARK-12513: SocketReceiver hang in Netcat example.

  • SPARK-12523: Support long-running of the Spark on HBase and hive metastore.

  • SPARK-12920: Fix high CPU usage in Spark thrift server with concurrent users..

  • SPARK-12948: OrcRelation uses HadoopRDD which can broadcast conf objects frequently..

  • SPARK-12967: NettyRPC races with SparkContext.stop() and throws exception.

  • SPARK-12998: Enable OrcRelation even when connecting via spark thrift server..

  • SPARK-13021: Fail fast when custom RDD's violate RDD.partition's API contract.

  • SPARK-13117: WebUI should use the local ip not 0.0.0.0.

  • SPARK-13278: Launcher fails to start with JDK 9 EA.

  • SPARK-13308: ManagedBuffers passed to OneToOneStreamManager need to be freed in non error cases.

  • SPARK-13360: pyspark related enviroment variable is not propagated to driver in yarn-cluster mode.

  • SPARK-13468: Fix a corner case where the page UI should show DAG but it doesn't show.

  • SPARK-13478: Use real user when fetching delegation tokens.

  • SPARK-13885: Fix attempt id regression for Spark running on Yarn.

  • SPARK-13902: Make DAGScheduler not to create duplicate stage.

  • SPARK-14062: Fix log4j and upload metrics.properties automatically with distributed cache.

  • SPARK-14091: Consider improving performance of SparkContext.getCallSite()..

  • SPARK-15067: YARN executors are launched with fixed perm gen size.

  • SPARK-1537: Add integration with Yarn's Application Timeline Server.

  • SPARK-15705: Change the default value of spark.sql.hive.convertMetastoreOrc to false.

  • SPARK-15844: HistoryServer doesn't come up if spark.authenticate = true.

  • SPARK-15990: Add rolling log aggregation support for Spark on yarn.

  • SPARK-16110: Can't set Python via spark-submit for YARN cluster mode when PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON are set.

  • SPARK-19033: HistoryServer still uses old ACLs even if ACLs are updated.

  • SPARK-19306: Fix inconsistent state in DiskBlockObjectWriter when exception occurred.

  • SPARK-19970: Table owner should be USER instead of PRINCIPAL in kerberized clusters.