Setting up a Spark data connection
Spark data connections within the same environment as Cloudera Machine Learning are automatically discovered, but you can also set up a connection manually. Follow this procedure to set up a Spark data connection.
- In the Workspaces UI, select the link environment for the workspace you are using. This takes you to the Environments UI.
- In Environments, select tabs.
- Select the directory path shown for Hive Metastore External Warehouse, and copy it.
- In Project Settings > Data Connections, click New Connection.
- Enter a name for the connection.
- Select the type: Spark Data Lake
- Paste the value you copied in step 3 into Datalake Hive Metastore External Warehouse Directory.
- Click Create.