CDS Powered by Apache Spark New Features and Changes
The following sections describe what's new and changed in each CDS Powered by Apache Spark release.
New Features
The following sections describe what's new in each CDS Powered by Apache Spark release.
Continue reading:
What's New in CDS 2.0 Release 2
-
The latest release (release 2) addresses a Hive compatibility issue that affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 parcel to avoid Spark 2 job failures when using Hive functionality.
What's New in CDS 2.0 Release 1
- New SparkSession object replaces HiveContext and SQLContext.
- Most of the Hive logic has been reimplemented in Spark.
- Some Hive dependencies still exist:
- SerDe support.
- UDF support.
- Added support for the unified Dataset API.
- Faster Spark SQL achieved with whole stage code generation.
- More complete SQL syntax now supports subqueries.
- Adds the spark-csv library.
- Backport of SPARK-5847. The root for metrics is now the app name (spark.app.name) instead of the app ID. The app ID requires investigation to match to the app name, and changes when streaming jobs are stopped and restarted.