CDS Powered by Apache Spark Fixed Issues
The following sections describe the issues fixed in each CDS 2 Powered by Apache Spark release.
Issues Fixed in CDS 2.1 Release 4
The following list includes issues fixed in CDS 2.1 Release 4. Test-only changes are omitted.
- [SPARK-23243][SPARK-20715][CORE][2.2] Fix RDD.repartition() data correctness issue
- [SPARK-17769][CORE][SCHEDULER] Some FetchFailure refactoring
- [SPARK-26201] Fix python broadcast with encryption
- [SPARK-24918][CORE] Executor Plugin API
- [PYSPARK][SQL] Updates to RowQueue
- [PYSPARK] Updates to pyspark broadcast
- [SPARK-25253][PYSPARK] Refactor local connection & auth code
- CDH-74338. Upgrade jackson-databind to Cloudera version
- [spark] CDH-57150. Exit spark-shell/spark-submit/pyspark with correct error message if no client configuration found
Issues Fixed in CDS 2.1 Release 3
The following list includes issues fixed in CDS 2.1 Release 3. Test-only changes are omitted.
- [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.1] Shuffle+Repartition on a DataFrame could lead to incorrect answers
- [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
- [PYSPARK] Updates to Accumulators
- [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error
- [SPARK-20223][SQL] Fix typo in tpcds q77.sql
- [SPARK-24589][CORE] Correctly identify tasks in output commit coordinator [branch-2.1].
- [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
- [SPARK-23732][DOCS] Fix source links in generated scaladoc.
- [WEBUI] Avoid possibility of script in query param keys
- Fix compilation caused by SPARK-24257
- [SPARK-24257][SQL] LongToUnsafeRowMap calculate the new size may be wrong
- [R][BACKPORT-2.2] backport lint fix
- [SPARKR] Match pyspark features in SparkR communication protocol.
- [PYSPARK] Update py4j to version 0.10.7.
- [SPARK-21278][PYSPARK] Upgrade to Py4J 0.10.6
- [SPARK-23697][CORE] LegacyAccumulatorWrapper should define isZero correctly
- [SPARK-23053][CORE][BRANCH-2.1] taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status
- [SPARK-22862] Docs on lazy elimination of columns missing from an encoder
- [SPARK-22688][SQL] Upgrade Janino version to 3.0.8
- [SPARK-22373][BUILD][FOLLOWUP][BRANCH-2.1] Updates other dependency lists too for Janino
- [SPARK-22373] Bump Janino dependency version to fix thread safety issue…
- [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source
- [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh
- [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for version warning
- [SPARK-22429][STREAMING] Streaming checkpointing code does not retry after failure
- [MINOR][DOC] automatic type inference supports also Date and Timestamp
- [SPARK-21991][LAUNCHER][FOLLOWUP] Fix java lint
- [SPARK-21991][LAUNCHER] Fix race condition in LauncherServer#acceptConnections
- [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.
- [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns
- [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE
- [SPARK-22167][R][BUILD] sparkr packaging issue allow zinc
- [SPARK-22129][SPARK-22138] Release script improvements
- [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows
- [SPARK-22072][SPARK-22071][BUILD] Improve release build scripts
- [SPARK-19318][SPARK-22041][SPARK-16625][BACKPORT-2.1][SQL] Docker test case failure: `: General data types to be mapped to Oracle`
- [SPARK-22052] Incorrect Metric assigned in MetricsReporter.scala
- [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles
- [SPARK-21953] Show both memory and disk bytes spilled if either is present
- [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs
- [SPARK-21976][DOC] Fix wrong documentation for Mean Absolute Error.
- [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop SparkContext.
- [SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not throw NPE
- [MINOR] Correct validateAndTransformSchema in GaussianMixture and AFTSurvivalRegression
- [SPARK-18752][SQL] Follow-up: add scaladoc explaining isSrcLocal arg.
- [SPARK-18752][HIVE] isSrcLocal" value should be set from user query.
- [CDH-70445] Executor Plugin Api.
- [SPARK-23852][SQL] Add test that fails if PARQUET-1217 is not fixed.
- [CDH-68516] Check for null when writing decimal.
- [CDH-69165] Handle file names with spaces in classpath.
- [SPARK-23991][DSTREAMS] Fix data loss when WAL write fails in allocateBlocksToBatch
- [SPARK-24309][CORE] AsyncEventQueue should stop on interrupt.
- [SPARK-22850][CORE] Ensure queued events are delivered to all event queues.
- [CDH-68051] Try to fetch tokens for all KMS servers.
- [SPARK-23433][CORE] Late zombie task completions update all tasksets
- [SPARK-23660] Fix exception in yarn cluster mode when application ended fast
- [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes
- [SPARK-18971][CORE] Upgrade Netty to 4.0.43.Final
- [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator
Issues Fixed in CDS 2.1 Release 2
The following list includes issues fixed in CDS 2.1 Release 2. Test-only changes are omitted.
- [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM.
- [SPARK-22083][CORE] Release locks in MemoryStore.evictBlocksToFreeSpace
- [SPARK-18838][HOTFIX][YARN] Check internal context state before stopping it.
- [SPARK-18838][CORE] Add separate listener queues to LiveListenerBus.
- [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo
- [SPARK-21254][WEBUI] History UI performance fixes
- [SPARK-21135][WEB UI] On history server page,duration of incompleted applications should be hidden instead of showing up as 0
- [SPARK-20942][WEB-UI] The title style about field is error in the history server web ui.
- [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load
- [SPARK-18682][SS] Batch Source for Kafka
- [SPARK-19155][ML] MLlib GeneralizedLinearRegression family and link should case insensitive
- [SPARK-19542][HOTFIX][SS]Fix the missing import in DataStreamReaderWriterSuite
- [SPARK-20280][CORE] FileStatusCache Weigher integer overflow
- [SPARK-19646][BUILD][HOTFIX] Fix compile error from cherry-pick of SPARK-19646 into branch 2.1
- [SPARK-21834] Incorrect executor request in case of dynamic allocation
- [SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache when paths are successfully removed
- [SPARK-21588][SQL] SQLContext.getConf(key, null) should return null
- [SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC table with extreme values on the partition column
- [SPARK-12717][PYTHON][BRANCH-2.1] Adding thread-safe broadcast pickle registry
- [SPARK-21555][SQL] RuntimeReplaceable should be compared semantically by its canonicalized child
- [SPARK-21306][ML] For branch 2.1, OneVsRest should support setWeightCol
- [SPARK-21446][SQL] Fix setAutoCommit never executed
- [SPARK-21441][SQL] Incorrect Codegen in SortMergeJoinExec results failures in some cases
- [SPARK-21332][SQL] Incorrect result type inferred for some decimal expressions
- [SPARK-21344][SQL] BinaryType comparison does signed byte array comparison
- [SPARK-21083][SQL][BRANCH-2.1] Store zero size and row count when analyzing empty table
- [SPARK-21345][SQL][TEST][TEST-MAVEN][BRANCH-2.1] SparkSessionBuilderSuite should clean up stopped sessions.
- [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream
- [SPARK-20256][SQL][BRANCH-2.1] SessionState should be created more lazily
- [SPARK-19104][BACKPORT-2.1][SQL] Lambda variables in ExternalMapToCatalyst should be global
- [SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct
- [SPARK-20555][SQL] Fix mapping of Oracle DECIMAL types to Spark types in read path
- [SPARK-21181] Release byteBuffers to suppress netty error messages
- [SPARK-21167][SS] Decode the path generated by File sink to handle special characters
- [SPARK-21138][YARN] Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
- [SPARK-19688][STREAMING] Not to read `spark.yarn.credentials.file` from checkpoint.
- [SPARK-21114][TEST][2.1] Fix test failure in Spark 2.1/2.0 due to name mismatch
- [SPARK-21072][SQL] TreeNode.mapChildren should only apply to the children node.
- [SPARK-16251][SPARK-20200][CORE][TEST] Flaky test: org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails with informative message
- [SPARK-20211][SQL][BACKPORT-2.2] Fix the Precision and Scale of Decimal Values when the Input is BigDecimal between -1.0 and 1.0
- [SPARK-21064][CORE][TEST] Fix the default value bug in NettyBlockTransferServiceSuite
- [SPARK-20920][SQL] ForkJoinPool pools are leaked when writing hive tables with many partitions
- [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException
- [SPARK-20275][UI] Do not display "Completed" column for in-progress applications
- [SPARK-20868][CORE] UnsafeShuffleWriter should verify the position after FileChannel.transferTo
- [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data
- [SPARK-20848][SQL] Shutdown the pool after reading parquet files
- [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel
- [SPARK-18406][CORE][BACKPORT-2.1] Race between end-of-task and completion iterator read lock release
- [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when converting from Breeze sparse matrix
- [SPARK-20798] GenerateUnsafeProjection should check if a value is null before calling the getter
- [SPARK-17424] Fix unsound substitution bug in ScalaReflection.
- [SPARK-20665][SQL] Bround" and "Round" function return NULL
- [SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ repeated arg.
- [SPARK-20688][SQL] correctly check analysis for scalar sub-queries
- [SPARK-19933][SQL] Do not change output of a subquery
- [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params
- [SPARK-20686][SQL] PropagateEmptyRelation incorrectly handles aggregate without grouping
- [SPARK-17685][SQL] Make SortMergeJoinExec's currentVars is null when calling createJoinKey
- [SPARK-20615][ML][TEST] SparseVector.argmax throws IndexOutOfBoundsException
- [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
- [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
- [SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkContext when stopping it
- [SPARK-20540][CORE] Fix unstable executor requests.
- [SPARK-20517][UI] Fix broken history UI download link
- [SPARK-20404][CORE] Using Option(name) instead of Some(name)
- [SPARK-20451] Filter out nested mapType datatypes from sort order in randomSplit
- [SPARK-20450][SQL] Unexpected first-query schema inference cost with 2.1.1
- [SPARK-20439][SQL][BACKPORT-2.1] Fix Catalog API listTables and getTable when failed to fetch table metadata
- [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky test
- [SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race
- [SPARK-20409][SQL] fail early if aggregate function in GROUP BY
- [SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE
- [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.
- [SPARK-20335][SQL][BACKPORT-2.1] Children expressions of Hive UDF impacts the determinism of Hive UDF
- [SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBackend.stop
- [SPARK-20304][SQL] AssertNotNull should not include path in string representation
- [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)
- [SPARK-17564][TESTS] Fix flaky RequestTimeoutIntegrationSuite.furtherRequestsDelay
- [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double
- [SPARK-20264][SQL] asm should be non-test dependency in sql/core
- [SPARK-20260][MLLIB] String interpolation required for error message
- [SPARK-20262][SQL] AssertNotNull should throw NullPointerException
- [SPARK-20246][SQL] should not push predicate down through aggregate with non-deterministic expressions
- [SPARK-20214][ML] Make sure converted csc matrix has sorted indices
- [SPARK-20191][YARN] Crate wrapper for RackResolver so tests can override it.
- [SPARK-20190][APP-ID] applications//jobs' in rest api,status should be [running|s…
- [SPARK-20164][SQL] AnalysisException not tolerant of null query plan.
- [SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider
- [SPARK-20134][SQL] SQLMetrics.postDriverMetricUpdates to simplify driver side metric updates
- [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for uppercase impurity type Gini
- [SPARK-20125][SQL] Dataset of type option of map does not work
- [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode
- [SPARK-20086][SQL] CollapseWindow should not collapse dependent adjacent windows
- [SPARK-19959][SQL] Fix to throw NullPointerException in df[java.lang.Long].collect
- [SPARK-20017][SQL] change the nullability of function 'StringToMap' from 'false' to 'true'
- [SPARK-19912][SQL] String literals should be escaped for Hive metastore partition pruning
- [SPARK-17204][CORE] Fix replicated off heap storage
- [SPARK-19980][SQL][BACKPORT-2.1] Add NULL checks in Bean serializer
- [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of PRINCIPAL in kerberized clusters
- [SPARK-19994][SQL] Wrong outputOrdering for right/full outer smj
- [SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging
- [SPARK-19872] [PYTHON] Use the correct deserializer for RDD construction for coalesce/repartition
- [SPARK-19887][SQL] dynamic partition keys can be null or empty string
- [SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst (branch-2.1)
- [SPARK-19924][SQL][BACKPORT-2.1] Handle InvocationTargetException for all Hive Shim
- [SPARK-19893][SQL] should not run DataFrame set oprations with map type
- [SPARK-19891][SS] Await Batch Lock notified on stream execution exit
- [SPARK-19861][SS] watermark should not be a negative time.
- [SPARK-19813] maxFilesPerTrigger combo latestFirst may miss old files in combination with maxFileAge in FileStreamSource
- [SPARK-18055][SQL] Use correct mirror in ExpresionEncoder
- [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
- [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one.
- [SPARK-19859][SS] The new watermark should override the old one
- [SPARK-19561][SQL] add int case handling for TimestampType
- [SPARK-19774] StreamExecution should call stop() on sources when a stream fails
- [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job
- [SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache
- [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists
- [SPARK-19707][CORE] Improve the invalid path check for sc.addJar
- [SPARK-19674][SQL] Ignore driver accumulator updates don't belong to …
- [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating percentile of decimal column
- [SPARK-19646][CORE][STREAMING] binaryRecords replicates records in scala API
- [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap
- [SPARK-19622][WEBUI] Fix a http error in a paged table when using a `Go` button to search.
- [SPARK-19603][SS] Fix StreamingQuery explain command
- [SPARK-19329][SQL][BRANCH-2.1] Reading from or writing to a datasource table with a non pre-existing location should succeed
- [SPARK-19501][YARN] Reduce the number of HDFS RPCs during YARN deployment
- [SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorClassLoader to load Netty generated classes
- [SPARK-19542][SS] Delete the temp checkpoint if a query is stopped without errors
- [SPARK-19543] from_json fails when the input row is empty
- [SPARK-19509][SQL] Grouping Sets do not respect nullable grouping columns
- [SPARK-18609][SPARK-18841][SQL][BACKPORT-2.1] Fix redundant Alias removal in the optimizer
- [SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme
- [SPARK-19472][SQL] Parser should not mistake CASE WHEN(...) for a function call
- [SPARK-19432][CORE] Fix an unexpected failure when connecting timeout
- [SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED
- [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger
- [SPARK-19406][SQL] Fix function to_json to respect user-provided options
- [SPARK-19338][SQL] Add UDF names in explain
- [SPARK-18863][SQL] Output non-aggregate expressions without GROUP BY in a subquery does not yield an error
- [SPARK-19330][DSTREAMS] Also show tooltip for successful batches
- [SPARK-19017][SQL] NOT IN subquery with more than one column may return incorrect results
- [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm failing in edge case
- [SPARK-9435][SQL] Reuse function in Java UDF to correctly support expressions that require equality comparison between ScalaUDF
- [SPARK-19306][CORE] Fix inconsistent state in DiskBlockObject when expection occurred
- [SPARK-19155][ML] Make family case insensitive in GLM
- [SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array type column for postgres.
- [SPARK-19267][SS] Fix a race condition when stopping StateStore
- [SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon error
- [SPARK-19065][SQL] Don't inherit expression id in dropDuplicates
- [SPARK-18905][STREAMING] Fix the issue of removing a failed jobset from JobScheduler.jobSets
- [SPARK-17237][SQL] Remove backticks in a pivot result schema
- [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries
- [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly
- [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQL] Backport Three Cache-related PRs to Spark 2.1
- [SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on DecimalType should find a common type with `typeSoFar`
- [SPARK-18837][WEBUI] Very long stage descriptions do not wrap in the UI
- [SPARK-18972][CORE] Fix the netty thread names for RPC
- [SPARK-18985][SS] Add missing @InterfaceStability.Evolving for Structured Streaming APIs
- [SPARK-18973][SQL] Remove SortPartitions and RedistributeData
- [SPARK-18947][SQL] SQLContext.tableNames should not call Catalog.listTables
- [SPARK-18927][SS] MemorySink for StructuredStreaming can't recover from checkpoint if location is provided in SessionConf
- [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when append data to an existing table
- [SPARK-18921][SQL] check database existence with Hive.databaseExists instead of getDatabase
- [SPARK-18108][SQL] Fix a schema inconsistent bug that makes a parquet reader fail to read data
- [SPARK-18892][SQL] Alias percentile_approx approx_percentile
- [SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.2 branch into 2.1 branch
- [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers
- [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information
- [SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util
- [SPARK-13669][SPARK-20898][CORE] Improve the blacklist mechanism to handle external shuffle service unavailable situation
- [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready
- [SPARK-13747][CORE] Fix potential ThreadLocal leaks in RPC when using ForkJoinPool
- [SPARK-21522][CORE] Fix flakiness in LauncherServerSuite.
- [SPARK-20904][CORE] Don't report task failures to driver during shutdown.
- [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities
- [SPARK-19146][CORE] Drop more elements when stageData.taskData.size > retainedTasks
- [SPARK-20084][CORE] Remove internal.metrics.updatedBlockStatuses from history files.
- [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use ConcurrentHashMap to make it faster
- [SPARK-19185][DSTREAM] Make Kafka consumer cache configurable
- [SPARK-20756][YARN] yarn-shuffle jar references unshaded guava
- [SPARK-20922][CORE][HOTFIX] Don't use Java 8 lambdas in older branches.
- [SPARK-20922][CORE] Add whitelist of classes that can be deserialized by the launcher.
- Add health aggregation to CSD.
- Update Spark2 CSD to not provide "cdh-plugin" since no one in CDH (yarn, in particular) uses Spark2 bits.
- Don't localize topology.py location.
- Relax compatibility to all C5 versions.
Issues Fixed in CDS 2.1 Release 1
The following list includes issues fixed in CDS 2.1 Release 1. Test-only changes are omitted.
- Preview of: [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM.
- [SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted
- [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting
- [SPARK-8425][CORE] Application Level Blacklisting
- [SPARK-18117][CORE] Add test for TaskSetBlacklist
- [SPARK-18949][SQL][BACKPORT-2.1] Add recoverPartitions API to Catalog
- [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC
- [SPARK-19611][SQL] Introduce configurable table schema inference
- [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parquet
- [SPARK-19766][SQL] Constant alias columns in INNER JOIN should not be folded by FoldablePropagation rule
- [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS
- [SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-client
- [SPARK-19617][SS] Fix the race condition when starting and stopping a query quickly (branch-2.1)
- [SPARK-19599][SS] Clean up HDFSMetadataLog
- [SPARK-19529] TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()
- [SPARK-18717][SQL] Make code generation for Scala Map work with immutable.Map also
- [SPARK-19512][BACKPORT-2.1][SQL] codegen for compare structs fails #16852
- [SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt
- [SPARK-18750][YARN] Follow up: move test to correct directory in 2.1 branch.
- [SPARK-18750][YARN] Avoid using "mapValues" when allocating containers.
- [SPARK-19268][SS] Disallow adaptive query execution for streaming queries
- [SPARK-18850][SS] Make StreamExecution and progress classes serializable
- [SPARK-18589][SQL] Fix Python UDF accessing attributes from both side of join
- [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan
- [SPARK-19129][SQL] SessionCatalog: Disallow empty part col values in partition spec
- [SPARK-19048][SQL] Delete Partition Location when Dropping Managed Partitioned Tables in InMemoryCatalog
- [SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0
- [SPARK-19092][SQL][BACKPORT-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481
- [SPARK-19120] Refresh Metadata Cache After Loading Hive Tables
- [SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn
- [SPARK-19178][SQL] convert string of large numbers to int should return null
- [SPARK-18687][PYSPARK][SQL] Backward compatibility - creating a Dataframe on a new SQLContext object fails with a Derby error
- [SPARK-19055][SQL][PYSPARK] Fix SparkSession initialization when SparkContext is stopped
- [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB
- [SPARK-18952][BACKPORT] Regex strings not properly escaped in codegen for aggregations
- [SPARK-17807][CORE] split test-tags into test-JAR
- [SPARK-18908][SS] Creating StreamingQueryException should check if logicalPlan is created
- [SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation buffer
- [SPARK-18234][SS] Made update mode public
- [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test
- [SPARK-18894][SS] Fix event time watermark delay threshold specified in months or years
- [SPARK-18281] [SQL] [PYSPARK] Remove timeout for reading data through socket for local iterator
- [SPARK-18761][CORE] Introduce "task reaper" to oversee task killing in executors
- [SPARK-18928] Check TaskContext.isInterrupted() in FileScanRDD, JDBCRDD & UnsafeSorter
- [SPARK-18700][SQL] Add StripedLock for each table's relation in cache
- [SPARK-18703][SPARK-18675][SQL][BACKPORT-2.1] CTAS for hive serde table should work for all hive versions AND Drop Staging Directories and Data Files
- [SPARK-18827][CORE] Fix cannot read broadcast on disk
- [SPARK-19520][STREAMING] Do not encrypt data written to the WAL.
- [SPARK-19857][YARN] Correctly calculate next credential update time.
- [SPARK-19626][YARN] Using the correct config to set credentials update time
- Preview of: [SPARK-4105] retry the fetch or stage if shuffle block is corrupt
- [SPARK-19307][PYSPARK] Make sure user conf is propagated to SparkContext.
Issues Fixed in CDS 2.0 Release 2
- [SPARK-4563][CORE] Allow driver to advertise a different network address.
- [SPARK-18993] Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
- [SPARK-19314] Do not allow sort before aggregation in Structured Streaming plan
- [SPARK-18762] Web UI should be http:4040 instead of https:4040
- [SPARK-18745] java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
- [SPARK-18703] Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM
- [SPARK-18091] Deep if expressions cause Generated SpecificUnsafeProjection code to exceed JVM code size limit
Issues Fixed in CDS 2.0 Release 1
- [SPARK-4563][CORE] Allow driver to advertise a different network address.
- [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite
- [SPARK-18677] Fix parsing ['key'] in JSON path expressions.
- [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
- [SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
- [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWrapper
- [SPARK-18674][SQL] improve the error message of using join
- [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spark Streaming
- [SPARK-17843][WEB UI] Indicate event logs pending for processing on h…
- [SPARK-17783][SQL][BACKPORT-2.0] Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC
- [SPARK-18640] Add synchronization to TaskScheduler.runningTasksByExecutors
- [SPARK-18553][CORE] Fix leak of TaskSetManager following executor loss
- [SPARK-18597][SQL] Do not push-down join conditions to the left side of a Left Anti join [BRANCH-2.0]
- [SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
- [SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`
- [SPARK-18436][SQL] isin causing SQL syntax error with JDBC
- [SPARK-18519][SQL][BRANCH-2.0] map type can not be used in EqualTo
- [SPARK-18053][SQL] compare unsafe and safe complex-type values correctly
- [SPARK-18504][SQL] Scalar subquery with extra group by columns returning incorrect result
- [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog
- [SPARK-18546][CORE] Fix merging shuffle spills when using encryption.
- [SPARK-18547][CORE] Propagate I/O encryption key when executors register.
- [SPARK-16625][SQL] General data types to be mapped to Oracle
- [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event
- [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0)
- [SPARK-18430][SQL][BACKPORT-2.0] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup
- [SPARK-18400][STREAMING] NPE when resharding Kinesis Stream
- [SPARK-18300][SQL] Do not apply foldable propagation with expand as a child [BRANCH-2.0]
- [SPARK-18337] Complete mode memory sinks should be able to recover from checkpoints
- [SPARK-16808][CORE] History Server main page does not honor APPLICATION_WEB_PROXY_BASE
- [SPARK-17348][SQL] Incorrect results from subquery transformation
- [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store
- [SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
- [SPARK-18010][CORE] Reduce work performed for building up the application list for the History Server app list UI page
- [SPARK-18382][WEBUI] "run at null:-1" in UI when no file/line info in call site info
- [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide
- [SPARK-17982][SQL][BACKPORT-2.0] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
- [SPARK-18387][SQL] Add serialization to checkEvaluation.
- [SPARK-18368][SQL] Fix regexp replace when serialized
- [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore
- [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`
- [SPARK-17703][SQL][BACKPORT-2.0] Add unnamed version of addReferenceObj for minor objects.
- [SPARK-18137][SQL] Fix RewriteDistinctAggregates UnresolvedException when a UDAF has a foldable TypeCheck
- [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest
- [SPARK-18125][SQL][BRANCH-2.0] Fix a compilation error in codegen due to splitExpression
- [SPARK-17849][SQL] Fix NPE problem when using grouping sets
- [SPARK-17693][SQL][BACKPORT-2.0] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field
- [SPARK-17981][SPARK-17957][SQL][BACKPORT-2.0] Fix Incorrect Nullability Setting to False in FilterExec
- [SPARK-18189][SQL][FOLLOWUP] Move test from ReplSuite to prevent java.lang.ClassCircularityError
- [SPARK-17337][SPARK-16804][SQL][BRANCH-2.0] Backport subquery related PRs
- [SPARK-18200][GRAPHX][FOLLOW-UP] Support zero as an initial capacity in OpenHashSet
- [SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet
- [SPARK-18111][SQL] Wrong approximate quantile answer when multiple records have the minimum value(for branch 2.0)
- [SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode
- [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ…
- [SPARK-18133][BRANCH-2.0][EXAMPLES][ML] Python ML Pipeline Exampl…
- [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent
- [SPARK-18114][HOTFIX] Fix line-too-long style error from backport of SPARK-18114
- [SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy
- [SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset
- [SPARK-18114][MESOS] Fix mesos cluster scheduler generage command option error
- [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files
- [SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server (branch 2.0)
- [SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc
- [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception
- [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection"
- [SPARK-17813][SQL][KAFKA] Maximum data per trigger
- [SPARK-18132] Fix checkstyle
- [SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand
- [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes
- [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0)
- [SPARK-18063][SQL] Failed to infer constraints over multiple aliases
- [SPARK-16304] LinkageError should not crash Spark executor
- [SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates for query
- [SPARK-18022][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL
- [SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled
- [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types
- [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance
- [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch
- [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing
- [SPARK-18093][SQL] Fix default value test in SQLConfSuite to work rega…
- [SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
- [SPARK-18058][SQL] [BRANCH-2.0]Comparing column types ignoring Nullability in Union and SetOperation
- [SPARK-17123][SQL][BRANCH-2.0] Use type-widened encoder for DataFrame for set operations
- [SPARK-17698][SQL] Join predicates should not contain filter clauses
- [SPARK-17986][ML] SQLTransformer should remove temporary tables
- [SPARK-16606][MINOR] Tiny follow-up to , to correct more instances of the same log message typo
- [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad
- [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration
- [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream
- [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset
- [SPARK-17926][SQL][STREAMING] Added json for statuses
- [SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date columns
- [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness
- [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD
- [SPARK-18003][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUniqueId index value overflowing
- [SPARK-17989][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException
- [SPARK-17675][CORE] Expand Blacklist for TaskSets
- [SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task.
- [SPARK-17304] Fix perf. issue caused by TaskSetManager.abortIfCompletelyBlacklisted
- [SPARK-15865][CORE] Blacklist should not result in job hanging with less than 4 executors
- [SPARK-15783][CORE] Fix Flakiness in BlacklistIntegrationSuite
- [SPARK-15783][CORE] still some flakiness in these blacklist tests so ignore for now
- [SPARK-15714][CORE] Fix flaky o.a.s.scheduler.BlacklistIntegrationSuite
- [SPARK-10372] [CORE] basic test framework for entire spark scheduler
- [SPARK-16106][CORE] TaskSchedulerImpl should properly track executors added to existing hosts
- [SPARK-18001][DOCUMENT] fix broke link to SparkDataFrame
- [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error
- [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0
- [SPARK-17841][STREAMING][KAFKA] drain commitQueue
- [SPARK-17711] Compress rolled executor log
- [SPARK-17751][SQL][BACKPORT-2.0] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException
- [SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0
- [SPARK-17892][SQL][2.0] Do Not Optimize Query in CTAS More Than Once #15048
- [SPARK-17819][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server
- [SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc
- [SPARK-17863][SQL] should not add column into Distinct
- [SPARK-17387][PYSPARK] Creating SparkContext() from python without spark-submit ignores user conf
- [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
- [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once
- [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics
- [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice
- [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB
- [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type.
- [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13
- [SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect.
- [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator
- [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite
- [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite
- [SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing
- [SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick
- [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin
- [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules
- [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)
- [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished
- [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths
- [SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax
- [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types
- [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic
- [SPARK-17803][TESTS] Upgrade docker-client dependency
- [SPARK-17780][SQL] Report Throwable to user in StreamExecution
- [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming
- [SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0)
- [SPARK-17758][SQL] Last returns wrong result in case of empty partition
- [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite
- [SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector
- [SPARK-17549][SQL] Only collect table size stat in driver for cached relation.
- [SPARK-17559][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer
- [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver
- [SPARK-17753][SQL] Allow a complex expression as the input a value based case statement
- [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract
- [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
- [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector
- [SPARK-17672] Spark 2.0 history server web Ui takes too long for a single application
- [SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates
- [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
- [SPARK-17641][SQL] Collect_list/Collect_set should not collect null values.
- [SPARK-17673][SQL] Incorrect exchange reuse with RowDataSourceScan (backport)
- [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch failure
- [SPARK-17666] Ensure that RecordReaders are closed by data source file scans (backport)
- [SPARK-17056][CORE] Fix a wrong assert regarding unroll memory in MemoryStore
- [SPARK-17618] Guard against invalid comparisons between UnsafeRow and other formats
- [SPARK-17652] Fix confusing exception message while reserving capacity
- [SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus
- [SPARK-17650] malformed url's throw exceptions before bricking Executors
- [SPARK-10835][ML] Word2Vec should accept non-null string array, in addition to existing null string array
- [SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue size configurable (branch 2.0)
- [SPARK-4563][CORE] Allow driver to advertise a different network address.
- [SPARK-17577][CORE][2.0 BACKPORT] Update SparkContext.addFile to make it work well on Windows
- [SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio
- [SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStreamSource.FileEntry
- [SPARK-16240][ML] ML persistence backward compatibility for LDA - 2.0 backport
- [SPARK-17502][17609][SQL][BACKPORT][2.0] Fix Multiple Bugs in DDL Statements on Temporary Views
- [SPARK-17599][SPARK-17569] Backport and to Spark 2.0 branch
- [SPARK-17616][SQL] Support a single distinct aggregate combined with a non-partial aggregate
- [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process is dead
- [SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames
- [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.
- [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode
- [SPARK-17627] Mark Streaming Providers Experimental
- [SPARK-17512][CORE] Avoid formatting to python path for yarn and mesos cluster mode
- [SPARK-17418] Prevent kinesis-asl-assembly artifacts from being published
- [SPARK-17617][SQL] Remainder(%) expression.eval returns incorrect result on double value
- [SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog in FileStreamSource (branch-2.0)
- [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable
- [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
- [SPARK-17160] Properly escape field names in code-generated error messages
- [SPARK-17100] [SQL] fix Python udf in filter on top of outer join
- [SPARK-16439] [SQL] bring back the separator in SQL UI
- [SPARK-17611][yarn][test] Make shuffle service test really test auth.
- [SPARK-17433] YarnShuffleService doesn't handle moving credentials levelDb
- [SPARK-17438][WEBUI] Show Application.executorLimit in the application page
- [SPARK-17473][SQL] fixing docker integration tests error due to different versions of jars.
- [SPARK-17589][TEST][2.0] Fix test case `create external table` in MetastoreDataSourcesSuite
- [SPARK-17297][DOCS] Clarify window/slide duration as absolute time, not relative to a calendar
- [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value
- [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly
- [SPARK-17586][BUILD] Do not call static member via instance reference
- [SPARK-17546][DEPLOY] start-* scripts should use hostname -f
- [SPARK-17541][SQL] fix some DDL bugs about table management when same-name temp view exists
- [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)
- [SPARK-17491] Close serialization stream to fix wrong answer bug in putIteratorAsBytes()
- [SPARK-17575][DOCS] Remove extra table tags in configuration document
- [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector
- [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems
- [SPARK-17567][DOCS] Use valid url to Spark RDD paper
- [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
- [SPARK-17484] Prevent invalid block locations from being reported after put() exceptions
- [SPARK-17364][SQL] Antlr lexer wrongly treats full qualified identifier as a decimal number token when parsing SQL string
- [SPARK-17483] Refactoring in BlockManager status reporting and block removal
- [SPARK-17114][SQL] Fix aggregates grouped by literals with empty input
- [SPARK-17547] Ensure temp shuffle data file is cleaned up after error
- [SPARK-17521] Error when I use sparkContext.makeRDD(Seq())
- [SPARK-17465][SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
- [SPARK-17463][CORE] Make CollectionAccumulator and SetAccumulator's value can be read thread-safely
- [SPARK-17511] Yarn Dynamic Allocation: Avoid marking released container as Failed
- [SPARK-17514] df.take(1) and df.limit(1).collect() should perform the same in Python
- [SPARK-17445][DOCS] Reference an ASF page as the main place to find third-party packages
- [SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rolling upgrade
- [SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch
- [SPARK-17480][SQL] Improve performance by removing or caching List.length which is O(n)
- [SPARK-17525][PYTHON] Remove SparkContext.clearFiles() from the PySpark API as it was removed from the Scala API prior to Spark 2.0.0
- [SPARK-17531] Don't initialize Hive Listeners for the Execution Client
- [SPARK-17515] CollectLimit.execute() should perform per-partition limits
- [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec
- [SPARK-17485] Prevent failed remote reads of cached blocks from failing entire job
- [SPARK-14818] Post-2.0 MiMa exclusion and build changes
- [SPARK-17503][CORE] Fix memory leak in Memory store when unable to cache the whole RDD in memory
- [SPARK-17486] Remove unused TaskMetricsUIData.updatedBlockStatuses field
- [SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh
- [SPARK-17439][SQL] Fixing compression issues with approximate quantiles and adding more tests
- [SPARK-17396][CORE] Share the task support between UnionRDD instances.
- [SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader
- [SPARK-17456][CORE] Utility for parsing Spark versions
- [SPARK-17339][CORE][BRANCH-2.0] Do not use path to get a filesystem in hadoopFile and newHadoopFile APIs
- [SPARK-16533][CORE] - backport driver deadlock fix to 2.0
- [SPARK-17370] Shuffle service files not invalidated when a slave is lost
- [SPARK-17296][SQL] Simplify parser join processing [BACKPORT 2.0]
- [SPARK-17372][SQL][STREAMING] Avoid serialization issues by using Arrays to save file names in FileStreamSource
- [SPARK-17279][SQL] better error message for exceptions during ScalaUDF execution
- [SPARK-17316][CORE] Fix the 'ask' type parameter in 'removeExecutor'
- [SPARK-17110] Fix StreamCorruptionException in BlockManager.getRemoteValues()
- [SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than spaces
- [SPARK-16334] [BACKPORT] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error
- [SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap
- [SPARK-17356][SQL] Fix out of memory issue when generating JSON for TreeNode
- [SPARK-17369][SQL][2.0] MetastoreRelation toJSON throws AssertException due to missing otherCopyArgs
- [SPARK-17358][SQL] Cached table(parquet/orc) should be shard between beelines
- [SPARK-17353][SPARK-16943][SPARK-16942][BACKPORT-2.0][SQL] Fix multiple bugs in CREATE TABLE LIKE command
- [SPARK-17391][TEST][2.0] Fix Two Test Failures After Backport
- [SPARK-17335][SQL] Fix ArrayType and MapType CatalogString.
- [SPARK-16663][SQL] desc table should be consistent between data source and hive serde tables
- [SPARK-16959][SQL] Rebuild Table Comment when Retrieving Metadata from Hive Metastore
- [SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect type
- [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter
- [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
- [SPARK-16935][SQL] Verification of Function-related ExternalCatalog APIs
- [SPARK-17352][WEBUI] Executor computing time can be negative-number because of calculation error
- [SPARK-17342][WEBUI] Style of event timeline is broken
- [SPARK-17355] Workaround for HIVE-14684 / HiveResultSetMetaData.isSigned exception
- [SPARK-16926] [SQL] Remove partition columns from partition metadata.
- [SPARK-17271][SQL] Planner adds un-necessary Sort even if child orde…
- [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again
- [SPARK-17180][SPARK-17309][SPARK-17323][SQL][2.0] create AlterViewAsCommand to handle ALTER VIEW AS
- [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite
- [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
- [SPARK-17243][WEB UI] Spark 2.0 History Server won't load with very large application history
- [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl
- [SPARK-17264][SQL] DataStreamWriter should document that it only supports Parquet for now
- [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
- [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore
- [SPARK-16216][SQL][FOLLOWUP][BRANCH-2.0] Bacoport enabling timestamp type tests for JSON and verify all unsupported types in CSV
- [SPARK-17216][UI] fix event timeline bars length
- [ML][MLLIB] The require condition and message doesn't match in SparseMatrix.
- [SPARK-15382][SQL] Fix a bug in sampling with replacement
- [SPARK-17274][SQL] Move join optimizer rules into a separate file
- [SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0)
- [SPARK-17269][SQL] Move finish analysis optimization stage into its own file
- [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions
- [SPARK-17235][SQL] Support purging of old logs in MetadataLog
- [SPARK-17246][SQL] Add BigDecimal literal
- [SPARK-17165][SQL] FileStreamSource should not track the list of seen files indefinitely
- [SPARK-17242][DOCUMENT] Update links of external dstream projects
- [SPARK-17231][CORE] Avoid building debug or trace log messages unless the respective log level is enabled
- [SPARK-17205] Literal.sql should handle Infinity and NaN
- [SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData
- [SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema
- [SPARK-17167][2.0][SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables
- [SPARK-16991][SPARK-17099][SPARK-17120][SQL] Fix Outer Join Elimination when Filter's isNotNull Constraints Unable to Filter Out All Null-supplying Rows
- [SPARK-17061][SPARK-17093][SQL][BACKPORT] MapObjects should make copies of unsafe-backed data
- [SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocationInfo == null
- [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
- [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat options for CSV and JSON
- [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
- [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer when some quantiles are duplicated
- [SPARK-17186][SQL] remove catalog table type INDEX
- [SPARK-17194] Use single quotes when generating SQL for string literals
- [SPARK-13286] [SQL] add the next expression of SQLException as cause
- [SPARK-17182][SQL] Mark Collect as non-deterministic
- [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication
- [SPARK-17162] Range does not support SQL generation
- [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
- [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]
- [SPARK-17115][SQL] decrease the threshold when split expressions
- [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly
- [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't exist in dependent module
- [SPARK-17124][SQL] RelationalGroupedDataset.agg should preserve order and allow multiple aggregates per column
- [SPARK-17104][SQL] LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation
- [SPARK-17150][SQL] Support SQL generation for inline tables
- [SPARK-17158][SQL] Change error message for out of range numeric literals
- [SPARK-17149][SQL] array.sql for testing array related functions
- [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode
- [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning
- [SPARK-11227][CORE] UnknownHostException can be thrown when NameNode HA is enabled.
- [SPARK-16994][SQL] Whitelist operators for predicate pushdown
- [SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace
- [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables
- [SPARK-17069] Expose spark.range() as table-valued function in SQL
- [SPARK-17117][SQL] 1 / NULL should not fail analysis
- [SPARK-16391][SQL] Support partial aggregation for reduceGroups
- [SPARK-16995][SQL] TreeNodeException when flat mapping RelationalGroupedDataset created from DataFrame containing a column created with lit/expr
- [SPARK-17096][SQL][STREAMING] Improve exception string reported through the StreamingQueryListener
- [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check
- [SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grows beyond 64 KB
- [SPARK-17084][SQL] Rename ParserUtils.assert to validate
- [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
- [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]
- [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister
- [SPARK-16508][SPARKR] Split docs for arrange and orderBy methods
- [SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPolySize
- [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists
- [SPARK-17013][SQL] Parse negative numeric literals
- [SPARK-16975][SQL] Column-partition path starting '_' should be handled correctly
- [SPARK-17022][YARN] Handle potential deadlock in driver handling messages
- [SPARK-17018][SQL] literals.sql for testing literal parsing
- [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
- [SPARK-15899][SQL] Fix the construction of the file path with hadoop Path for Spark 2.0
- [SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite
- [SPARK-17007][SQL] Move test data files into a test-data folder
- [SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQueryTestSuite.
- [SPARK-16866][SQL] Infrastructure for file-based SQL end-to-end tests
- [SPARK-17010][MINOR][DOC] Wrong description in memory management document
- [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader
- [SPARK-16324][SQL] regexp_extract should doc that it returns empty string when match fails
- [SPARK-16522][MESOS] Spark application throws exception on exit.
- [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
- [SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
- [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3
- [SPARK-16610][SQL] Add `orc.compress` as an alias for `compression` option.
- [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
- [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent with requestExecutors/killExecutors
- [SPARK-16586][CORE] Handle JVM errors printed to stdout.
- [SPARK-16936][SQL] Case Sensitivity Support for Refresh Temp Table
- [SPARK-16457][SQL] Fix Wrong Messages when CTAS with a Partition By Clause
- [SPARK-16939][SQL] Fix build error by using `Tuple1` explicitly in StringFunctionsSuite
- [SPARK-16409][SQL] regexp_extract with optional groups causes NPE
- [SPARK-16911] Fix the links in the programming guide
- [SPARK-16870][DOCS] Summary:add "spark.sql.broadcastTimeout" into docs/sql-programming-gu…
- [SPARK-16932][DOCS] Changed programming guide to not reference old accumulator API in Scala
- [SPARK-16925] Master should call schedule() after all executor exit events, not only failures
- [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"
- [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests.
- [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used
- [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length
- [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override)
- [SPARK-16880][ML][MLLIB] make ann training data persisted if needed
- [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample
- [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap
- [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
- [SPARK-14204][SQL] register driverClass rather than user-specified class
- [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type
- [SPARK-16796][WEB UI] Visible passwords on Spark environment page
- [SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics
- [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file
- [SPARK-16850][SQL] Improve type checking error message for greatest/least
- [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
- [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
- [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors
- [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
- [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
- [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
- [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions
- [SPARK-15869][STREAMING] Fix a potential NPE in StreamingJobProgressListener.getBatchUIData
- [SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling
- [SPARK-16791][SQL] cast struct with timestamp field fails
- [SPARK-16778][SQL][TRIVIAL] Fix deprecation warning with SQLContext
- [SPARK-16805][SQL] Log timezone when query result does not match
- [SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package
- [SPARK-16812] Open up SparkILoop.getAddedJars
- [SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to exception
- [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException
- [SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
- [SPARK-16750][ML] Fix GaussianMixture training failed due to feature column type mistake
- [SPARK-16664][SQL] Fix persist call on Data frames with more than 200…
- [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes
- [SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError
- [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
- [SPARK-16639][SQL] The query with having condition that contains grouping by column should work
- [SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQLSuite
- [SPARK-16730][SQL] Implement function aliases for type casts
- [SPARK-16729][SQL] Throw analysis exception for invalid date casts
- [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
- [SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues related to lead and lag functions
- [SPARK-16724] Expose DefinedByConstructorParams
- [SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS queries
- [SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextSuite when eventually fails
- [SPARK-14131][STREAMING] SQL Improved fix for avoiding potential deadlocks in HDFSMetadataLog
- [SPARK-16715][TESTS] Fix a potential ExprId conflict for SubexpressionEliminationSuite."Semantic equals and hash"
- [SPARK-16485][DOC][ML] Fixed several inline formatting in ml features doc
- [SPARK-16703][SQL] Remove extra whitespace in SQL generation for window functions
- [SPARK-16698][SQL] Field names having dots should be allowed for datasources based on FileFormat
- [SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First and Last
- [SPARK-16699][SQL] Fix performance bug in hash aggregate on long string keys
- [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
- [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
- [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python language binding
- [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more consistent with Scala API
- [SPARK-16650] Improve documentation of spark.task.maxFailures
- [SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be a constant
- [SPARK-16287][SQL] Implement str_to_map SQL function
- [SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader
- [SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable
- [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions
- [SPARK-16440][MLLIB] Destroy broadcasted variables even on driver
- [SPARK-5682][CORE] Add encrypted shuffle in spark
- [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values
- [SPARK-16272][CORE] Allow config values to reference conf, env, system props.
- [SPARK-16632][SQL] Use Spark requested schema to guide vectorized Parquet reader initialization
- [SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor.
- [SPARK-16505][YARN] Optionally propagate error during shuffle service startup.
- [SPARK-14963][MINOR][YARN] Fix typo in YarnShuffleService recovery file name
- [SPARK-14963][YARN] Using recoveryPath if NM recovery is enabled
- [SPARK-16349][SQL] Fall back to isolated class loader when classes not found.
- [SPARK-16119][sql] Support PURGE option to drop table / partition.