CDS Powered by Apache Spark Fixed Issues

The following sections describe the issues fixed in each CDS Powered by Apache Spark release.

Issues Fixed in CDS 2.0 - Release 2

  • [SPARK-4563][CORE] Allow driver to advertise a different network address.
  • [SPARK-18993] Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
  • [SPARK-19314] Do not allow sort before aggregation in Structured Streaming plan
  • [SPARK-18762] Web UI should be http:4040 instead of https:4040
  • [SPARK-18745] java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
  • [SPARK-18703] Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM
  • [SPARK-18091] Deep if expressions cause Generated SpecificUnsafeProjection code to exceed JVM code size limit

Issues Fixed in CDS 2.0 - Release 1

  • [SPARK-4563][CORE] Allow driver to advertise a different network address.
  • [SPARK-18685]][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite
  • [SPARK-18677]] Fix parsing ['key'] in JSON path expressions.
  • [SPARK-18617]][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
  • [SPARK-18617]][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
  • [SPARK-18274]][ML][PYSPARK] Memory leak in PySpark JavaWrapper
  • [SPARK-18674]][SQL] improve the error message of using join
  • [SPARK-18617]][CORE][STREAMING] Close "kryo auto pick" feature for Spark Streaming
  • [SPARK-17843]][WEB UI] Indicate event logs pending for processing on h…
  • [SPARK-17783]][SQL][BACKPORT-2.0] Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC
  • [SPARK-18640]] Add synchronization to TaskScheduler.runningTasksByExecutors
  • [SPARK-18553]][CORE] Fix leak of TaskSetManager following executor loss
  • [SPARK-18597]][SQL] Do not push-down join conditions to the left side of a Left Anti join [BRANCH-2.0]
  • [SPARK-18118]][SQL] fix a compilation error due to nested JavaBeans
  • [SPARK-17251]][SQL] Improve `OuterReference` to be `NamedExpression`
  • [SPARK-18436]][SQL] isin causing SQL syntax error with JDBC
  • [SPARK-18519]][SQL][BRANCH-2.0] map type can not be used in EqualTo
  • [SPARK-18053]][SQL] compare unsafe and safe complex-type values correctly
  • [SPARK-18504]][SQL] Scalar subquery with extra group by columns returning incorrect result
  • [SPARK-18477]][SS] Enable interrupts for HDFS in HDFSMetadataLog
  • [SPARK-18546]][CORE] Fix merging shuffle spills when using encryption.
  • [SPARK-18547]][CORE] Propagate I/O encryption key when executors register.
  • [SPARK-16625]][SQL] General data types to be mapped to Oracle
  • [SPARK-18462]] Fix ClassCastException in SparkListenerDriverAccumUpdates event
  • [SPARK-18459]][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0)
  • [SPARK-18430]][SQL][BACKPORT-2.0] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup
  • [SPARK-18400]][STREAMING] NPE when resharding Kinesis Stream
  • [SPARK-18300]][SQL] Do not apply foldable propagation with expand as a child [BRANCH-2.0]
  • [SPARK-18337]] Complete mode memory sinks should be able to recover from checkpoints
  • [SPARK-16808]][CORE] History Server main page does not honor APPLICATION_WEB_PROXY_BASE
  • [SPARK-17348]][SQL] Incorrect results from subquery transformation
  • [SPARK-18416]][STRUCTURED STREAMING] Fixed temp file leak in state store
  • [SPARK-18432]][DOC] Changed HDFS default block size from 64MB to 128MB
  • [SPARK-18010]][CORE] Reduce work performed for building up the application list for the History Server app list UI page
  • [SPARK-18382]][WEBUI] "run at null:-1" in UI when no file/line info in call site info
  • [SPARK-18426]][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide
  • [SPARK-17982]][SQL][BACKPORT-2.0] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
  • [SPARK-18387]][SQL] Add serialization to checkEvaluation.
  • [SPARK-18368]][SQL] Fix regexp replace when serialized
  • [SPARK-18342]] Make rename failures fatal in HDFSBackedStateStore
  • [SPARK-18280]][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`
  • [SPARK-17703]][SQL][BACKPORT-2.0] Add unnamed version of addReferenceObj for minor objects.
  • [SPARK-18137]][SQL] Fix RewriteDistinctAggregates UnresolvedException when a UDAF has a foldable TypeCheck
  • [SPARK-18283]][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest
  • [SPARK-18125]][SQL][BRANCH-2.0] Fix a compilation error in codegen due to splitExpression
  • [SPARK-17849]][SQL] Fix NPE problem when using grouping sets
  • [SPARK-17693]][SQL][BACKPORT-2.0] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field
  • [SPARK-17981]][SPARK-17957][SQL][BACKPORT-2.0] Fix Incorrect Nullability Setting to False in FilterExec
  • [SPARK-18189]][SQL][FOLLOWUP] Move test from ReplSuite to prevent java.lang.ClassCircularityError
  • [SPARK-17337]][SPARK-16804][SQL][BRANCH-2.0] Backport subquery related PRs
  • [SPARK-18200]][GRAPHX][FOLLOW-UP] Support zero as an initial capacity in OpenHashSet
  • [SPARK-18200]][GRAPHX] Support zero as an initial capacity in OpenHashSet
  • [SPARK-18111]][SQL] Wrong approximate quantile answer when multiple records have the minimum value(for branch 2.0)
  • [SPARK-18160]][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode
  • [SPARK-16796]][WEB UI] Mask spark.authenticate.secret on Spark environ…
  • [SPARK-18133]][BRANCH-2.0][EXAMPLES][ML] Python ML Pipeline Exampl…
  • [SPARK-18144]][SQL] logging StreamingQueryListener$QueryStartedEvent
  • [SPARK-18114]][HOTFIX] Fix line-too-long style error from backport of SPARK-18114
  • [SPARK-18148]][SQL] Misleading Error Message for Aggregation Without Window/GroupBy
  • [SPARK-18189]][SQL] Fix serialization issue in KeyValueGroupedDataset
  • [SPARK-18114]][MESOS] Fix mesos cluster scheduler generage command option error
  • [SPARK-18030]][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files
  • [SPARK-18143]][SQL] Ignore Structured Streaming event logs to avoid breaking history server (branch 2.0)
  • [SPARK-16312]][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc
  • [SPARK-18164]][SQL] ForeachSink should fail the Spark job if `process` throws exception
  • [SPARK-16963]][SQL] Fix test "StreamExecution metadata garbage collection"
  • [SPARK-17813]][SQL][KAFKA] Maximum data per trigger
  • [SPARK-18132]] Fix checkstyle
  • [SPARK-18009]][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand
  • [SPARK-16963]][STREAMING][SQL] Changes to Source trait and related implementation classes
  • [SPARK-13747]][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0)
  • [SPARK-18063]][SQL] Failed to infer constraints over multiple aliases
  • [SPARK-16304]] LinkageError should not crash Spark executor
  • [SPARK-17733]][SQL] InferFiltersFromConstraints rule never terminates for query
  • [SPARK-18022]][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL
  • [SPARK-16988]][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled
  • [SPARK-18070]][SQL] binary operator should not consider nullability when comparing input types
  • [SPARK-17624]][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance
  • [SPARK-18044]][STREAMING] FileStreamSource should not infer partitions in every batch
  • [SPARK-17153]][SQL] Should read partition data when reading new files in filestream without globbing
  • [SPARK-18093]][SQL] Fix default value test in SQLConfSuite to work rega…
  • [SPARK-17810]][SQL] Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
  • [SPARK-18058]][SQL] [BRANCH-2.0]Comparing column types ignoring Nullability in Union and SetOperation
  • [SPARK-17123]][SQL][BRANCH-2.0] Use type-widened encoder for DataFrame for set operations
  • [SPARK-17698]][SQL] Join predicates should not contain filter clauses
  • [SPARK-17986]][ML] SQLTransformer should remove temporary tables
  • [SPARK-16606]][MINOR] Tiny follow-up to , to correct more instances of the same log message typo
  • [SPARK-17853]][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad
  • [SPARK-16312]][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration
  • [SPARK-17812]][SQL][KAFKA] Assign and specific startingOffsets for structured stream
  • [SPARK-17929]][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset
  • [SPARK-17926]][SQL][STREAMING] Added json for statuses
  • [SPARK-17811]] SparkR cannot parallelize data.frame with NA or NULL in Date columns
  • [SPARK-18034]] Upgrade to MiMa 0.1.11 to fix flakiness
  • [SPARK-17999]][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD
  • [SPARK-18003]][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUniqueId index value overflowing
  • [SPARK-17989]][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException
  • [SPARK-17675]][CORE] Expand Blacklist for TaskSets
  • [SPARK-17623]][CORE] Clarify type of TaskEndReason with a failed task.
  • [SPARK-17304]] Fix perf. issue caused by TaskSetManager.abortIfCompletelyBlacklisted
  • [SPARK-15865]][CORE] Blacklist should not result in job hanging with less than 4 executors
  • [SPARK-15783]][CORE] Fix Flakiness in BlacklistIntegrationSuite
  • [SPARK-15783]][CORE] still some flakiness in these blacklist tests so ignore for now
  • [SPARK-15714]][CORE] Fix flaky o.a.s.scheduler.BlacklistIntegrationSuite
  • [SPARK-10372]] [CORE] basic test framework for entire spark scheduler
  • [SPARK-16106]][CORE] TaskSchedulerImpl should properly track executors added to existing hosts
  • [SPARK-18001]][DOCUMENT] fix broke link to SparkDataFrame
  • [SPARK-17711]][TEST-HADOOP2.2] Fix hadoop2.2 compilation error
  • [SPARK-17731]][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0
  • [SPARK-17841]][STREAMING][KAFKA] drain commitQueue
  • [SPARK-17711]] Compress rolled executor log
  • [SPARK-17751]][SQL][BACKPORT-2.0] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException
  • [SPARK-17731]][SQL][STREAMING] Metrics for structured streaming for branch-2.0
  • [SPARK-17892]][SQL][2.0] Do Not Optimize Query in CTAS More Than Once #15048
  • [SPARK-17819]][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server
  • [SPARK-17953]][DOCUMENTATION] Fix typo in SparkSession scaladoc
  • [SPARK-17863]][SQL] should not add column into Distinct
  • [SPARK-17387]][PYSPARK] Creating SparkContext() from python without spark-submit ignores user conf
  • [SPARK-17834]][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
  • [SPARK-17876]] Write StructuredStreaming WAL to a stream instead of materializing all at once
  • [SPARK-16827]][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics
  • [SPARK-17782]][STREAMING][KAFKA] alternative eliminate race condition of poll twice
  • [SPARK-17790]][SPARKR] Support for parallelizing R data.frame larger than 2GB
  • [SPARK-17884]][SQL] To resolve Null pointer exception when casting from empty string to interval type.
  • [SPARK-17808]][PYSPARK] Upgraded version of Pyrolite to 4.13
  • [SPARK-17880]][DOC] The url linking to `AccumulatorV2` in the document is incorrect.
  • [SPARK-17816]][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator
  • [SPARK-17346]][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite
  • [SPARK-17738]][TEST] Fix flaky test in ColumnTypeSuite
  • [SPARK-17417]][CORE] Fix # of partitions for Reliable RDD checkpointing
  • [SPARK-17832]][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick
  • [SPARK-17806]] [SQL] fix bug in join key rewritten in HashJoin
  • [SPARK-17782]][STREAMING][BUILD] Add Kafka 0.10 project to build modules
  • [SPARK-17346]][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)
  • [SPARK-17707]][WEBUI] Web UI prevents spark-submit application to be finished
  • [SPARK-17805]][PYSPARK] Fix in sqlContext.read.text when pass in list of paths
  • [SPARK-17612]][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax
  • [SPARK-17792]][ML] L-BFGS solver for linear regression does not accept general numeric label column types
  • [SPARK-17750]][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic
  • [SPARK-17803]][TESTS] Upgrade docker-client dependency
  • [SPARK-17780]][SQL] Report Throwable to user in StreamExecution
  • [SPARK-17798]][SQL] Remove redundant Experimental annotations in sql.streaming
  • [SPARK-17643]] Remove comparable requirement from Offset (backport for branch-2.0)
  • [SPARK-17758]][SQL] Last returns wrong result in case of empty partition
  • [SPARK-17778]][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite
  • [SPARK-17773]][BRANCH-2.0] Input/Output] Add VoidObjectInspector
  • [SPARK-17549]][SQL] Only collect table size stat in driver for cached relation.
  • [SPARK-17559]][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer
  • [SPARK-17112]][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver
  • [SPARK-17753]][SQL] Allow a complex expression as the input a value based case statement
  • [SPARK-17587]][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract
  • [SPARK-17736]][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
  • [SPARK-17721]][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector
  • [SPARK-17672]] Spark 2.0 history server web Ui takes too long for a single application
  • [SPARK-17712]][SQL] Fix invalid pushdown of data-independent filters beneath aggregates
  • [SPARK-16343]][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
  • [SPARK-17641]][SQL] Collect_list/Collect_set should not collect null values.
  • [SPARK-17673]][SQL] Incorrect exchange reuse with RowDataSourceScan (backport)
  • [SPARK-17644]][CORE] Do not add failedStages when abortStage for fetch failure
  • [SPARK-17666]] Ensure that RecordReaders are closed by data source file scans (backport)
  • [SPARK-17056]][CORE] Fix a wrong assert regarding unroll memory in MemoryStore
  • [SPARK-17618]] Guard against invalid comparisons between UnsafeRow and other formats
  • [SPARK-17652]] Fix confusing exception message while reserving capacity
  • [SPARK-17649]][CORE] Log how many Spark events got dropped in LiveListenerBus
  • [SPARK-17650]] malformed url's throw exceptions before bricking Executors
  • [SPARK-10835]][ML] Word2Vec should accept non-null string array, in addition to existing null string array
  • [SPARK-15703]][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue size configurable (branch 2.0)
  • [SPARK-4563][CORE] Allow driver to advertise a different network address.
  • [SPARK-17577][CORE][2.0 BACKPORT] Update SparkContext.addFile to make it work well on Windows
  • [SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio
  • [SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStreamSource.FileEntry
  • [SPARK-16240][ML] ML persistence backward compatibility for LDA - 2.0 backport
  • [SPARK-17502][17609][SQL][BACKPORT][2.0] Fix Multiple Bugs in DDL Statements on Temporary Views
  • [SPARK-17599][SPARK-17569] Backport and to Spark 2.0 branch
  • [SPARK-17616][SQL] Support a single distinct aggregate combined with a non-partial aggregate
  • [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process is dead
  • [SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames
  • [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.
  • [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode
  • [SPARK-17627] Mark Streaming Providers Experimental
  • [SPARK-17512][CORE] Avoid formatting to python path for yarn and mesos cluster mode
  • [SPARK-17418] Prevent kinesis-asl-assembly artifacts from being published
  • [SPARK-17617][SQL] Remainder(%) expression.eval returns incorrect result on double value
  • [SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog in FileStreamSource (branch-2.0)
  • [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable
  • [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
  • [SPARK-17160] Properly escape field names in code-generated error messages
  • [SPARK-17100] [SQL] fix Python udf in filter on top of outer join
  • [SPARK-16439] [SQL] bring back the separator in SQL UI
  • [SPARK-17611][yarn][test] Make shuffle service test really test auth.
  • [SPARK-17433] YarnShuffleService doesn't handle moving credentials levelDb
  • [SPARK-17438][WEBUI] Show Application.executorLimit in the application page
  • [SPARK-17473][SQL] fixing docker integration tests error due to different versions of jars.
  • [SPARK-17589][TEST][2.0] Fix test case `create external table` in MetastoreDataSourcesSuite
  • [SPARK-17297][DOCS] Clarify window/slide duration as absolute time, not relative to a calendar
  • [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value
  • [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly
  • [SPARK-17586][BUILD] Do not call static member via instance reference
  • [SPARK-17546][DEPLOY] start-* scripts should use hostname -f
  • [SPARK-17541][SQL] fix some DDL bugs about table management when same-name temp view exists
  • [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)
  • [SPARK-17491] Close serialization stream to fix wrong answer bug in putIteratorAsBytes()
  • [SPARK-17575][DOCS] Remove extra table tags in configuration document
  • [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector
  • [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems
  • [SPARK-17567][DOCS] Use valid url to Spark RDD paper
  • [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
  • [SPARK-17484] Prevent invalid block locations from being reported after put() exceptions
  • [SPARK-17364][SQL] Antlr lexer wrongly treats full qualified identifier as a decimal number token when parsing SQL string
  • [SPARK-17483] Refactoring in BlockManager status reporting and block removal
  • [SPARK-17114][SQL] Fix aggregates grouped by literals with empty input
  • [SPARK-17547] Ensure temp shuffle data file is cleaned up after error
  • [SPARK-17521] Error when I use sparkContext.makeRDD(Seq())
  • [SPARK-17465][SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • [SPARK-17463][CORE] Make CollectionAccumulator and SetAccumulator's value can be read thread-safely
  • [SPARK-17511] Yarn Dynamic Allocation: Avoid marking released container as Failed
  • [SPARK-17514] df.take(1) and df.limit(1).collect() should perform the same in Python
  • [SPARK-17445][DOCS] Reference an ASF page as the main place to find third-party packages
  • [SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • [SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch
  • [SPARK-17480][SQL] Improve performance by removing or caching List.length which is O(n)
  • [SPARK-17525][PYTHON] Remove SparkContext.clearFiles() from the PySpark API as it was removed from the Scala API prior to Spark 2.0.0
  • [SPARK-17531] Don't initialize Hive Listeners for the Execution Client
  • [SPARK-17515] CollectLimit.execute() should perform per-partition limits
  • [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec
  • [SPARK-17485] Prevent failed remote reads of cached blocks from failing entire job
  • [SPARK-14818] Post-2.0 MiMa exclusion and build changes
  • [SPARK-17503][CORE] Fix memory leak in Memory store when unable to cache the whole RDD in memory
  • [SPARK-17486] Remove unused TaskMetricsUIData.updatedBlockStatuses field
  • [SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh
  • [SPARK-17439][SQL] Fixing compression issues with approximate quantiles and adding more tests
  • [SPARK-17396][CORE] Share the task support between UnionRDD instances.
  • [SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader
  • [SPARK-17456][CORE] Utility for parsing Spark versions
  • [SPARK-17339][CORE][BRANCH-2.0] Do not use path to get a filesystem in hadoopFile and newHadoopFile APIs
  • [SPARK-16533][CORE] - backport driver deadlock fix to 2.0
  • [SPARK-17370] Shuffle service files not invalidated when a slave is lost
  • [SPARK-17296][SQL] Simplify parser join processing [BACKPORT 2.0]
  • [SPARK-17372][SQL][STREAMING] Avoid serialization issues by using Arrays to save file names in FileStreamSource
  • [SPARK-17279][SQL] better error message for exceptions during ScalaUDF execution
  • [SPARK-17316][CORE] Fix the 'ask' type parameter in 'removeExecutor'
  • [SPARK-17110] Fix StreamCorruptionException in BlockManager.getRemoteValues()
  • [SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than spaces
  • [SPARK-16334] [BACKPORT] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error
  • [SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap
  • [SPARK-17356][SQL] Fix out of memory issue when generating JSON for TreeNode
  • [SPARK-17369][SQL][2.0] MetastoreRelation toJSON throws AssertException due to missing otherCopyArgs
  • [SPARK-17358][SQL] Cached table(parquet/orc) should be shard between beelines
  • [SPARK-17353][SPARK-16943][SPARK-16942][BACKPORT-2.0][SQL] Fix multiple bugs in CREATE TABLE LIKE command
  • [SPARK-17391][TEST][2.0] Fix Two Test Failures After Backport
  • [SPARK-17335][SQL] Fix ArrayType and MapType CatalogString.
  • [SPARK-16663][SQL] desc table should be consistent between data source and hive serde tables
  • [SPARK-16959][SQL] Rebuild Table Comment when Retrieving Metadata from Hive Metastore
  • [SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect type
  • [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter
  • [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
  • [SPARK-16935][SQL] Verification of Function-related ExternalCatalog APIs
  • [SPARK-17352][WEBUI] Executor computing time can be negative-number because of calculation error
  • [SPARK-17342][WEBUI] Style of event timeline is broken
  • [SPARK-17355] Workaround for HIVE-14684 / HiveResultSetMetaData.isSigned exception
  • [SPARK-16926] [SQL] Remove partition columns from partition metadata.
  • [SPARK-17271][SQL] Planner adds un-necessary Sort even if child orde…
  • [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again
  • [SPARK-17180][SPARK-17309][SPARK-17323][SQL][2.0] create AlterViewAsCommand to handle ALTER VIEW AS
  • [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite
  • [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
  • [SPARK-17243][WEB UI] Spark 2.0 History Server won't load with very large application history
  • [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl
  • [SPARK-17264][SQL] DataStreamWriter should document that it only supports Parquet for now
  • [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
  • [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore
  • [SPARK-16216][SQL][FOLLOWUP][BRANCH-2.0] Bacoport enabling timestamp type tests for JSON and verify all unsupported types in CSV
  • [SPARK-17216][UI] fix event timeline bars length
  • [ML][MLLIB] The require condition and message doesn't match in SparseMatrix.
  • [SPARK-15382][SQL] Fix a bug in sampling with replacement
  • [SPARK-17274][SQL] Move join optimizer rules into a separate file
  • [SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0)
  • [SPARK-17269][SQL] Move finish analysis optimization stage into its own file
  • [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions
  • [SPARK-17235][SQL] Support purging of old logs in MetadataLog
  • [SPARK-17246][SQL] Add BigDecimal literal
  • [SPARK-17165][SQL] FileStreamSource should not track the list of seen files indefinitely
  • [SPARK-17242][DOCUMENT] Update links of external dstream projects
  • [SPARK-17231][CORE] Avoid building debug or trace log messages unless the respective log level is enabled
  • [SPARK-17205] Literal.sql should handle Infinity and NaN
  • [SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData
  • [SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema
  • [SPARK-17167][2.0][SQL] Issue Exceptions when Analyze Table on In-Memory Cataloged Tables
  • [SPARK-16991][SPARK-17099][SPARK-17120][SQL] Fix Outer Join Elimination when Filter's isNotNull Constraints Unable to Filter Out All Null-supplying Rows
  • [SPARK-17061][SPARK-17093][SQL][BACKPORT] MapObjects should make copies of unsafe-backed data
  • [SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocationInfo == null
  • [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
  • [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat options for CSV and JSON
  • [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
  • [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer when some quantiles are duplicated
  • [SPARK-17186][SQL] remove catalog table type INDEX
  • [SPARK-17194] Use single quotes when generating SQL for string literals
  • [SPARK-13286] [SQL] add the next expression of SQLException as cause
  • [SPARK-17182][SQL] Mark Collect as non-deterministic
  • [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication
  • [SPARK-17162] Range does not support SQL generation
  • [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
  • [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]
  • [SPARK-17115][SQL] decrease the threshold when split expressions
  • [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly
  • [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't exist in dependent module
  • [SPARK-17124][SQL] RelationalGroupedDataset.agg should preserve order and allow multiple aggregates per column
  • [SPARK-17104][SQL] LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation
  • [SPARK-17150][SQL] Support SQL generation for inline tables
  • [SPARK-17158][SQL] Change error message for out of range numeric literals
  • [SPARK-17149][SQL] array.sql for testing array related functions
  • [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode
  • [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning
  • [SPARK-11227][CORE] UnknownHostException can be thrown when NameNode HA is enabled.
  • [SPARK-16994][SQL] Whitelist operators for predicate pushdown
  • [SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace
  • [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables
  • [SPARK-17069] Expose spark.range() as table-valued function in SQL
  • [SPARK-17117][SQL] 1 / NULL should not fail analysis
  • [SPARK-16391][SQL] Support partial aggregation for reduceGroups
  • [SPARK-16995][SQL] TreeNodeException when flat mapping RelationalGroupedDataset created from DataFrame containing a column created with lit/expr
  • [SPARK-17096][SQL][STREAMING] Improve exception string reported through the StreamingQueryListener
  • [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check
  • [SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grows beyond 64 KB
  • [SPARK-17084][SQL] Rename ParserUtils.assert to validate
  • [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
  • [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]
  • [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister
  • [SPARK-16508][SPARKR] Split docs for arrange and orderBy methods
  • [SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPolySize
  • [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists
  • [SPARK-17013][SQL] Parse negative numeric literals
  • [SPARK-16975][SQL] Column-partition path starting '_' should be handled correctly
  • [SPARK-17022][YARN] Handle potential deadlock in driver handling messages
  • [SPARK-17018][SQL] literals.sql for testing literal parsing
  • [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
  • [SPARK-15899][SQL] Fix the construction of the file path with hadoop Path for Spark 2.0
  • [SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite
  • [SPARK-17007][SQL] Move test data files into a test-data folder
  • [SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQueryTestSuite.
  • [SPARK-16866][SQL] Infrastructure for file-based SQL end-to-end tests
  • [SPARK-17010][MINOR][DOC] Wrong description in memory management document
  • [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader
  • [SPARK-16324][SQL] regexp_extract should doc that it returns empty string when match fails
  • [SPARK-16522][MESOS] Spark application throws exception on exit.
  • [SPARK-16905] SQL DDL: MSCK REPAIR TABLE
  • [SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
  • [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3
  • [SPARK-16610][SQL] Add `orc.compress` as an alias for `compression` option.
  • [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
  • [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent with requestExecutors/killExecutors
  • [SPARK-16586][CORE] Handle JVM errors printed to stdout.
  • [SPARK-16936][SQL] Case Sensitivity Support for Refresh Temp Table
  • [SPARK-16457][SQL] Fix Wrong Messages when CTAS with a Partition By Clause
  • [SPARK-16939][SQL] Fix build error by using `Tuple1` explicitly in StringFunctionsSuite
  • [SPARK-16409][SQL] regexp_extract with optional groups causes NPE
  • [SPARK-16911] Fix the links in the programming guide
  • [SPARK-16870][DOCS] Summary:add "spark.sql.broadcastTimeout" into docs/sql-programming-gu…
  • [SPARK-16932][DOCS] Changed programming guide to not reference old accumulator API in Scala
  • [SPARK-16925] Master should call schedule() after all executor exit events, not only failures
  • [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"
  • [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests.
  • [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used
  • [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length
  • [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override)
  • [SPARK-16880][ML][MLLIB] make ann training data persisted if needed
  • [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample
  • [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap
  • [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
  • [SPARK-14204][SQL] register driverClass rather than user-specified class
  • [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type
  • [SPARK-16796][WEB UI] Visible passwords on Spark environment page
  • [SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics
  • [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file
  • [SPARK-16850][SQL] Improve type checking error message for greatest/least
  • [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
  • [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
  • [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors
  • [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
  • [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
  • [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
  • [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions
  • [SPARK-15869][STREAMING] Fix a potential NPE in StreamingJobProgressListener.getBatchUIData
  • [SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling
  • [SPARK-16791][SQL] cast struct with timestamp field fails
  • [SPARK-16778][SQL][TRIVIAL] Fix deprecation warning with SQLContext
  • [SPARK-16805][SQL] Log timezone when query result does not match
  • [SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package
  • [SPARK-16812] Open up SparkILoop.getAddedJars
  • [SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to exception
  • [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException
  • [SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
  • [SPARK-16750][ML] Fix GaussianMixture training failed due to feature column type mistake
  • [SPARK-16664][SQL] Fix persist call on Data frames with more than 200…
  • [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes
  • [SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError
  • [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
  • [SPARK-16639][SQL] The query with having condition that contains grouping by column should work
  • [SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQLSuite
  • [SPARK-16730][SQL] Implement function aliases for type casts
  • [SPARK-16729][SQL] Throw analysis exception for invalid date casts
  • [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
  • [SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues related to lead and lag functions
  • [SPARK-16724] Expose DefinedByConstructorParams
  • [SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS queries
  • [SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextSuite when eventually fails
  • [SPARK-14131][STREAMING] SQL Improved fix for avoiding potential deadlocks in HDFSMetadataLog
  • [SPARK-16715][TESTS] Fix a potential ExprId conflict for SubexpressionEliminationSuite."Semantic equals and hash"
  • [SPARK-16485][DOC][ML] Fixed several inline formatting in ml features doc
  • [SPARK-16703][SQL] Remove extra whitespace in SQL generation for window functions
  • [SPARK-16698][SQL] Field names having dots should be allowed for datasources based on FileFormat
  • [SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First and Last
  • [SPARK-16699][SQL] Fix performance bug in hash aggregate on long string keys
  • [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
  • [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
  • [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python language binding
  • [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more consistent with Scala API
  • [SPARK-16650] Improve documentation of spark.task.maxFailures
  • [SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be a constant
  • [SPARK-16287][SQL] Implement str_to_map SQL function
  • [SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader
  • [SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable
  • [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions
  • [SPARK-16440][MLLIB] Destroy broadcasted variables even on driver
  • [SPARK-5682][CORE] Add encrypted shuffle in spark
  • [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values
  • [SPARK-16272][CORE] Allow config values to reference conf, env, system props.
  • [SPARK-16632][SQL] Use Spark requested schema to guide vectorized Parquet reader initialization
  • [SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor.
  • [SPARK-16505][YARN] Optionally propagate error during shuffle service startup.
  • [SPARK-14963][MINOR][YARN] Fix typo in YarnShuffleService recovery file name
  • [SPARK-14963][YARN] Using recoveryPath if NM recovery is enabled
  • [SPARK-16349][SQL] Fall back to isolated class loader when classes not found.
  • [SPARK-16119][sql] Support PURGE option to drop table / partition.