Set up a Hive or Impala data connection manually

Data connections to Hive or Impala virtual warehouses within the same environment as the CML workspace are automatically discovered and configured. You can also set up a data connection manually, which works across CDP environments. Follow this procedure to set up a Hive or Impala data connection.

  1. Log into the CDP web interface and navigate to the Data Warehouse service.
  2. In the Data Warehouse service, select Virtual Warehouses in the left navigation panel.
  3. Select the options menu for the warehouse you want to access, and select Copy JDBC URL.
  4. Return to the Machine Learning service. In Site Administration > Data Connections, select New Connection.
  5. Enter the connection name. You cannot have duplicate names for data connections within a workspace or within a given project.
  6. Select the connection type:
    1. Hive Virtual Warehouse
    2. Impala Virtual Warehouse
  7. Paste the JDBC URL for the data connection.
  8. (Optional) Enter the Virtual Warehouse Name. This is the name of the warehouse in Cloudera Data Warehouse.
The data connection is available to users by default. To change availability, click the Available switch. This switch determines if the data connection is displayed in Projects created within the workspace.