CDS Powered by Apache Spark New Features and Changes
The following sections describe what's new and changed in each CDS Powered by Apache Spark release.
Continue reading:
- What's New in CDS 2.4 Release 2
- What's New in CDS 2.4 Release 1
- What's New in CDS 2.3 Release 4
- What's New in CDS 2.3 Release 3
- What's New in CDS 2.3 Release 2
- What's New in CDS 2.3 Release 1
- What's New in CDS 2.2 Release 4
- What's New in CDS 2.2 Release 3
- What's New in CDS 2.2 Release 2
- What's New in CDS 2.2 Release 1
- What's New in CDS 2.1 Release 4
- What's New in CDS 2.1 Release 3
- What's New in CDS 2.1 Release 2
- What's New in CDS 2.1 Release 1
- What's New in CDS 2.0 Release 2
- What's New in CDS 2.0 Release 1
What's New in CDS 2.4 Release 2
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.4 Release 1
- Added support for Structured Streaming. However, note that
the following features of Structured Streaming are not supported:
- Continuous processing, which is still experimental, is not supported.
- Stream static joins with HBase have not been tested and therefore are not supported.
- Also note the associated Known Issue here: Structured Streaming exactly-once fault tolerance constraints
- Added support for built-in Apache Avro data source. For details, refer: SPARK-24768, Apache Avro Data Source Guide.
Also see CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.3 Release 4
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.3 Release 3
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.3 Release 2
-
More flexibility to interpret TIMESTAMP values written by Impala. Setting the spark.sql.parquet.int96TimestampConversion configuration setting to true makes Spark interpret TIMESTAMP values, when reading from Parquet files written by Impala, without applying any adjustment from the UTC to the local time zone of the server. This behavior provides better interoperability for Parquet data written by Impala, which does not apply any time zone adjustment to TIMESTAMP values when reading or writing them.
Also see CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.3 Release 1
CDS 2.3 Release 1 was never officially released; if downloaded, do not use.
What's New in CDS 2.2 Release 4
This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.2 Release 3
This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.2 Release 2
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.2 Release 1
-
Support for CDH 5.12 and associated features.
-
Support for using Spark 2 jobs to read and write data on the Azure Data Lake Store (ADLS) cloud service.
-
CDS 2.2 and higher require JDK 8 only. If you are using CD 2.2 or higher, you must remove JDK 7 from all cluster and gateway hosts to ensure proper operation.
Also see CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.1 Release 4
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.1 Release 3
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.1 Release 2
This is purely a maintenance release. See CDS Powered by Apache Spark Fixed Issues for the list of fixed issues.
What's New in CDS 2.1 Release 1
-
New direct connector to Kafka that uses the new Kafka consumer API. See Integrating CDS Powered by Apache Spark with Apache Kafka for details.
What's New in CDS 2.0 Release 2
-
A Hive compatibility issue in CDS 2.0 Release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the CDS 2.0 Release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.
What's New in CDS 2.0 Release 1
- New SparkSession object replaces HiveContext and SQLContext.
- Most of the Hive logic has been reimplemented in Spark.
- Some Hive dependencies still exist:
- SerDe support.
- UDF support.
- Added support for the unified Dataset API.
- Faster Spark SQL achieved with whole stage code generation.
- More complete SQL syntax now supports subqueries.
- Adds the spark-csv library.
- Backport of SPARK-5847. The root for metrics is now the app name (spark.app.name) instead of the app ID. The app ID requires investigation to match to the app name, and changes when streaming jobs are stopped and restarted.