Spark
HDP 2.5.3 provides Spark 1.6.2 and the following Apache patches:
SPARK-6005: Flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery.
SPARK-11301: Fix case sensitivity for filter on partitioned columns.
SPARK-15606: Driver hang in o.a.s.DistributedSuite on 2 core machine.
SPARK-16077: Python UDF may fail because of six.
SPARK-16664: Fix persist call on Data frames with more than 200 columns.
SPARK-17512: Avoid formatting to python path for yarn and mesos cluster mode.
SPARK-11182: HDFS Delegation Token will be expired when calling "UserGroupInformation.getCurrentUser.addCredentials" in HA mode.
HDP 2.5.0 provided Spark 1.6.2 and the following Apache patches:
SPARK-16214: Fix the denominator of SparkP.
SPARK-6005: Flaky test o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recover.
SPARK-6717: Clear shuffle files after checkpointing in AL.
SPARK-6735: Add window based executor failure tracking mechanism for long running service.
SPARK-6847: Stack overflow on updateStateByKey which followed by a stream with checkpoint set.
SPARK-7481: Add SPARK-cloud module to pull in AWS+azure object store FS accessors; test integration.
SPARK-7889: Jobs progress of apps on complete page of HistoryServer shows incomplete.
SPARK-10582: Using dynamic-executor-allocation, if AM failed, the new AM will be started. But the new AM does not allocate executors to drive.
SPARK-11137: Make StreamingContext.stop() exception-saf.
SPARK-11182: HDFS Delegation Token will be expired when calling "UserGroupInformation.getCurrentUser.addCredentials" in HA mod.
SPARK-11314: Add service API and test service for Yarn Cluster scheduler.
SPARK-11315: Add YARN extension service to publish Spark events to YARN timeline service (part of SPARK-1537.
SPARK-11323: Add History Service Provider to service application histories from YARN timeline server (part of SPARK-1537.)
SPARK-11627: Spark Streaming backpressure mechanism has no initial rate limit, receivers receive data at the maximum speed , it might cause OOM exception.
SPARK-12001: StreamingContext cannot be completely stopped if the stop() is interrupted.
SPARK-12009: Avoid re-allocate yarn container while driver want to stop all executor.
SPARK-12142: Can't request executor when container allocator us bit read.
SPARK-12241: Improve failure reporting in Yarn client obtainTokenForHBase.
SPARK-12353: Wrong output for countByValue and countByValueAndWIndow.
SPARK-12513: SocketReceiver hang in Netcat example.
SPARK-12523: Support long-running of the Spark on HBase and hive metastore.
SPARK-12920: Fix high CPU usage in Spark thrift server with concurrent users.
SPARK-12948: OrcRelation uses HadoopRDD which can broadcast conf objects frequently.
SPARK-12967: NettyRPC races with SparkContext.stop() and throws exception.
SPARK-12998: Enable OrcRelation even when connecting via spark thrift server.
SPARK-13021: Fail fast when custom RDD's violate RDD.partition's API contract.
SPARK-13117: WebUI should use the local ip not 0.0.0.
SPARK-13278: Launcher fails to start with JDK 9 E.
SPARK-13308: ManagedBuffers passed to OneToOneStreamManager need to be freed in non error case.
SPARK-13360: pyspark related environment variable is not propagated to driver in yarn-cluster mod.
SPARK-13468: Fix a corner case where the page UI should show DAG but it doesn't show.
SPARK-13478: Use real user when fetching delegation token.
SPARK-13885: Fix attempt id regression for Spark running on Yarn.
SPARK-13902: Make DAGScheduler not to create duplicate stage.
SPARK-14062: Fix log4j and upload metrics.properties automatically with distributed cache.
SPARK-14091: Consider improving performance of SparkContext.getCallSite().
SPARK-15067: YARN executors are launched with fixed perm gen size.
SPARK-1537: Add integration with Yarn's Application Timeline Serve.
SPARK-15606: Driver hang in o.a.s.DistributedSuite on 2 core machine.
SPARK-15844: HistoryServer doesn't come up if spark.authenticate = true.
SPARK-15990: Add rolling log aggregation support for Spark on yarn.
SPARK-15990: Add rolling log aggregation support for Spark on yarn.
SPARK-16077: Python UDF may fail because of six.
SPARK-16110: Can't set Python via spark-submit for YARN cluster mode when PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON are set (JIRA in Apache Spark is IN PROGRESS).
SPARK-16193: Address flaky ExternalAppendOnlyMapSuite spilling test.