Creating a CML data connection to an Impala data warehouse
Learn how to connect natively to data stored in Impala when using Data Visualization in
Cloudera Machine Learning (CML).
You must connect to your data prior to using the data modeling and visualization functions.
The following steps show you how to create a new CML data connection to an Impala data
warehouse.
When you create a connection, you automatically gain privileges to create and manage
datasets associated with this connection, and to build dashboards and visuals within these
datasets.
For more information on the Manage data connections privilege, see RBAC
permissions.
For instructions on how to define privileges for a specific role, see Setting role
privileges.
For instructions on how to assign the administrator role to a user, see Promoting
a user to administrator.
On the main navigation bar, click DATA.
The DATA interface appears, open on the
Datasets tab.
In the Data side menu bar, click NEW
CONNECTION.
The Create New Data Connection modal window appears.
Select the ImpalaConnection type from the drop-down list and provide a name for your
connection.
Enter the hostname or IP address of the running coordinator.
You can get the coordinator hostname from the JDBC URL of the Impala DW.
Enter 443 in the Port # field.
Enter your workload username and password as credentials.
Click the Advanced tab and configure the additional
details.
Locate the Impala Endpoint for the data hub.
Copy it and paste it into the HTTP Path field.
Click TEST to test the connection.
Click CONNECT to create the connection.
You have set up a connection to a running Impala DW.