Creating a CML data connection to Impala

Learn how to connect natively to data stored in Impala when using Data Visualization in Cloudera Machine Learning (CML).

Before you start using data modeling and visualization functions, you must connect to your data. The following steps show you how to create a new CML data connection in Cloudera Data Visualization (CDV) to an Impala data warehouse.

When you create a connection, you automatically get privileges to create and manage the associated datasets. You can also build dashboards and visuals within these datasets.

  • For more information on the Manage data connections privilege, see RBAC permissions.
  • For instructions on how to define privileges for a specific role, see Setting role privileges.
  • For instructions on how to assign the administrator role to a user, see Promoting a user to administrator.

If you are using a CDP Base cluster running Impala with Kerberos for authentication, make sure that Kerberos credentials are configured in CML before creating a CML data connection to the Impala data warehouse. This ensures seamless integration and authentication between CDV and the Impala cluster. If you add Kerberos credentials after launching the CDV app, you need to restart the app for the changes to take effect.

For more information on using Kerberos for authentication in CML, see Hadoop Authentication for ML Workspaces.

  1. On the main navigation bar, click DATA.
    The DATA interface opens, displaying the Datasets tab.
  2. On the side menu bar, click NEW CONNECTION.

    The Create New Data Connection modal window appears.

  3. Choose Impala from the Connection type drop-down list and assign a name to your connection.

    In this example, the Impala connection is made through Knox. Knox always uses TLS encryption and port 443 is the default HTTPS port.

  4. Enter the hostname or IP address of the running coordinator.
    You can retrieve this information from the JDBC URL of the Impala DW.
  5. Add 443 in the Port # field.
  6. Enter your workload username and password as credentials.
  7. Click the Advanced tab to configure additional details.
    1. For HTTP connection mode, locate the Impala Endpoint for the Data Hub.
    2. Copy and paste it into the HTTP Path field.
    3. Set any additional details as required.
  8. Check the Parameters and Data tabs for more configuration options.
  9. Once you finish configuring the settings, click TEST to check the connection.
  10. Click CONNECT to establish the connection.
You have successfully set up a connection to a running Impala DW.