What's New in Apache Spark
Learn about the new features of Spark in Cloudera Runtime 7.1.9.
CDS 3.3 powered by Apache Spark
The default Spark runtime in Cloudera Runtime version 7.1.9 for CDP Private Cloud Base is Spark 2.4.8 which is deprecated. CDS 3.3 is an add-on parcel that can be installed to provide support for Spark 3.3. Additionally, CDS 3.3 is certified for Cloudera Runtime version 7.1.9 for CDP Private Cloud Base based on Spark version 3.3.2 and contains all the feature content of that release.
- Support for virtual clusters powered by Apache Spark 3 is now available.
- Spark 2 is deprecated in Cloudera Runtime 7.1.9, therefore 7.1.9 is the last Cloudera Runtime release where Spark 2 is supported.
- Support for Hive Warehouse Connector (HWC) - that is, Hive managed ACID tables (Direct Reader and JDBC mode)
- The following functionalities are not currently supported:
- Deep analysis (visual profiler)
- Phoenix Connector
- SparkR
See Running Apache Spark 3 applications and Deprecation Notices for Spark 2.
Spark 3 support in Oozie
Oozie introduced the new Spark 3 based Spark 3 actions. For more information, see Spark 3 support in Oozie
Spark History Server with High Availability
You can configure the load balancer for Spark History Server (SHS) to ensure high availability, so that users can access and use the Spark History Server UI without any disruption. Learn how you can configure the load balancer for SHS and the limitations associated with it. For more information, see Using Spark History Servers with high availability.