Creating a CML data connection to a Hive data warehouse

Learn how to connect natively to data stored in Hive when using Data Visualization in Cloudera Machine Learning (CML).

You must connect to your data prior to using the data modeling and visualization functionalities. The following steps show you how to create a new CML data connection to a Hive data warehouse.

When you create a connection, you automatically have privileges to create and manage datasets on this connection, and to build dashboards and visuals in these datasets.

  • For more information on the Manage data connections privilege, see RBAC permissions.
  • For instructions on how to define privileges for a role, see Setting role privileges.
  • For instructions on how to assign the administrator role to a user, see Promoting a user to administrator.
  1. On the main navigation bar, click DATA.

    The DATA interface appears, open on the Datasets tab.

  2. In the Data side menu bar, click NEW CONNECTION.
    The Create New Data Connection modal window appears.
  3. Select the Hive Connection type from the drop-down list and enter the hostname or IP address of the running coordinator.
    You can get the coordinator hostname from the JDBC URL of the Hive DW.
  4. Use port 443.
  5. Click the Advanced tab and make the selections below:
  6. Click the Parameters tab and set the hive.server2.async.exec.async.compile parameter to false.
  7. Use your workload username and password as credentials.
  8. Click TEST and then CONNECT to create the connection.
You have set up a connection to a running Hive DW.