What's New in Apache Spark

Learn about the new features of Spark in Cloudera Runtime 7.1.9.

CDS 3.3 powered by Apache Spark

The default Spark runtime in Cloudera Runtime version 7.1.9 for CDP Private Cloud Base is Spark 2.4.8 which is deprecated. CDS 3.3 is an add-on parcel that can be installed to provide support for Spark 3.3. Additionally, CDS 3.3 is certified for Cloudera Runtime version 7.1.9 for CDP Private Cloud Base based on Spark version 3.3.2 and contains all the feature content of that release.

  • Support for virtual clusters powered by Apache Spark 3 is now available.
  • Spark 2 is deprecated in Cloudera Runtime 7.1.9, therefore 7.1.9 is the last Cloudera Runtime release where Spark 2 is supported.
  • Support for Hive Warehouse Connector (HWC) - that is, Hive managed ACID tables (Direct Reader and JDBC mode)
  • The following functionalities are not currently supported:
    • Deep analysis (visual profiler)
    • Phoenix Connector
    • SparkR

See Running Apache Spark 3 applications and Deprecation Notices for Spark 2.

Spark 3 support in Oozie

Oozie introduced the new Spark 3 based Spark 3 actions. For more information, see Spark 3 support in Oozie

Spark History Server with High Availability

You can configure the load balancer for Spark History Server (SHS) to ensure high availability, so that users can access and use the Spark History Server UI without any disruption. Learn how you can configure the load balancer for SHS and the limitations associated with it. For more information, see Using Spark History Servers with high availability.