Spark Connect Sessions

You can learn what a Spark Connect Session is, certain known limitations and the supported Runtime component versions.

What a Spark Connect Session is

A session is an interactive short-lived development environment for running Spark commands. A Spark Connect Session is a type of CDE Session that exposes the Spark Connect interface. A Spark Connect Session allows you to connect to Spark from any remote Python environment.

Supported versions of Cloudera Runtime components

Ensure that you are using the following software versions of the Runtime components before you use Spark Connect Sessions:
  • Spark 3.4.1
  • CDP Runtime 7.1.8

Supported Spark Connectors

The following Spark Connectors are supported with the previously listed Runtime component versions:
  • Hive
  • HDFS
  • Hive tables Parquet storage
  • Hive tables ORC storage
  • Ranger - table-level access controls

Limitations

Spark Connect Sessions do not support the following:
  • Profile support: Spark Connect does not support profiles in the configuration files even though the CDE clients support "Profiles" in the configuration files.
  • Documentation links within the Spark Connect UI point to incorrect documents.
  • Session creation allows a mix of uppercase and lowercase letters in the session names. However, using uppercase letters causes Spark Connect Sessions to connect incorrectly. As a workaround, use only lowercase letters in session names.
  • Access control support: Spark Connect Sessions do not support access control. After a session is created, anyone with access to the virtual cluster can connect to it.