Configuring external IDE Spark Connect sessions

Learn about how to configure a Spark Connect Session with CDE.

Before you create a Spark Connect Session, perform the following steps:
  1. Enable a Cloudera Data Engineering service .
  2. Create a CDE Virtual cluster. You must select All Purpose (Tier 2) in the Virtual Cluster option and Spark 3.5.1 as the Spark version.
  1. Perform the following steps on each user's machine:
    1. Create the ~/.cde/config.yaml configuration file and add the vcluster-endpoint and cdp-endpoint parameters. This allows the client machine to identify a virtual cluster. For more information, see vcluster-endpoint and cdp-endpoint.
      For example,
      cdp-endpoint: https://console-cdp.apps.example.com
      credentials-file: /Users/user1/.cde/credentials
      vcluster-endpoint: https://ffws6v27.cde-c9b822vr.apps.example.com/dex/api/v1
    2. Create an access key and update the credentials-file parameter in the ~/.cde/config.yaml configuration file with the path where the credentials file is located. This allows the client machine to acquire the short-lived access tokens.
      For example,
      [default]
      cdp_access_key_id=571ff....
      cdp_private_key=dvbYd....
      
  2. Create a Spark Connect Session using one of the following methods:
    • Using the UI: Create a new session as per Creating Sessions in Cloudera Data Engineering but when you select the session type, select Spark Connect (Tech Preview) from the Type drop-down list.
    • Using the CLI: Create a Spark Connect Session by running the following command:
      cde session create --name [***SPARK-SESSION-NAME***] --type spark-connect
      
  3. On the CDE Home page, click Sessions and then select the Spark Connect Session that you have created.
  4. Go to the Connect tab and download the required CDE TAR file and PySpark TAR file as displayed on the screen.
  5. Create a new Python virtual environment or use your existing one and install the TAR file after activating your Python virtual environment.
    python3 -m venv cdeconnect
    . cdeconnect/bin/activate
    
    pip install [***CDECONNECT TARBALL***]
    pip install [***PYSPARK TARBALL***]