Setting up a Spark data connection

Spark data connections within the same environment as Cloudera Machine Learning are automatically discovered, but you can also set up a connection manually. Follow this procedure to set up a Spark data connection.

  1. In the Workspaces UI, select the link environment for the workspace you are using. This takes you to the Environments UI.
  2. In Environments, select Data Lake > Cloud Storage tabs.
  3. Select the directory path shown for Hive Metastore External Warehouse, and copy it.
  4. In Project Settings > Data Connections, click New Connection.
  5. Enter a name for the connection.
  6. Select the type: Spark Data Lake
  7. Paste the value you copied in step 3 into Datalake Hive Metastore External Warehouse Directory.
  8. Click Create.
The data connection is available to users by default. To change availability, click the Available toggle.