Set up a data connection to CDP DataHub
You can set up a data connection to the DataHub cluster. You can set up a connection using the New Connection dialog, or by using raw code inside your project. Both approaches are shown below.
- Log into the console web UI.
- Depending on which connection you want to use, click on either Connect to Hive or Connect to Impala tile.
- Ensure your Data Hub cluster name is correct in the popup.
- Copy the JDBC URL string.
- Now click on the Build a Data Science Project tile to log into the the ML workspace.
- In New Connection. , select
- Return to the Machine Learning service. In New Connection. , select
- Enter the connection name. You cannot have duplicate names for data connections within a workspace or within a given project.
Select the connection type:
- Hive Virtual Warehouse
- Impala Virtual Warehouse
- Paste the JDBC URL for the data connection.
(Optional) Enter the Virtual Warehouse Name. This is the
name of the warehouse in Cloudera Data Warehouse.
Set up a DataHub data connection using raw code
It is recommended to use the New Connection dialog to create a new data connection. If needed, you can also set up a data connection in your project code by using and adapting the following code snippet.
from impala.dbapi import connect #Example connection string: # jdbc:hive2://my-test-master0.eng-ml-i.svbr-nqvp.int.cldr.work/;ssl=true;transportMode=http;httpPath=my-test/cdp-proxy-api/hive conn = connect( host = "my-test-master0.eng-ml-i.svbr-nqvp.int.cldr.work", port = 443, auth_mechanism = "LDAP", use_ssl = True, use_http_transport = True, http_path = "my-test/cdp-proxy-api/hive", user = "csso_me", password = "Test@123") cursor = conn.cursor() cursor.execute("select * from 3yearpop") for row in cursor: print(row) cursor.close() conn.close()