Spark Connect Sessions
You can learn what a Spark Connect Session is, certain known limitations and the supported Runtime component versions.
What a Spark Connect Session is
A session is an interactive short-lived development environment for running Spark commands. A Spark Connect Session is a type of CDE Session that exposes the Spark Connect interface. A Spark Connect Session allows you to connect to Spark from any remote Python environment.
Supported versions of Cloudera Runtime components
Ensure that you are using the following software versions of the Runtime components before you
use Spark Connect Sessions:
- Spark 3.4.1
- CDP Runtime 7.1.8
Supported Spark Connectors
The following Spark Connectors are supported with the previously listed Runtime component
versions:
- Hive
- HDFS
- Hive tables Parquet storage
- Hive tables ORC storage
- Ranger - table-level access controls
Limitations
Spark Connect Sessions do not support the following:
- Profile support: Spark Connect does not support profiles in the configuration files even though the CDE clients support "Profiles" in the configuration files.
- Documentation links within the Spark Connect UI point to incorrect documents.
- Session creation allows a mix of uppercase and lowercase letters in the session names. However, using uppercase letters causes Spark Connect Sessions to connect incorrectly. As a workaround, use only lowercase letters in session names.
- Access control support: Spark Connect Sessions do not support access control. After a session is created, anyone with access to the virtual cluster can connect to it.