Creating a connection to Cloudera Data Warehouse or Data Hub instance for SQL Operator

Learn how to create an Airflow connection to an existing Cloudera Data Warehouse or a Data Hub instance before running the workloads using the supported Airflow SQL Operators.

The following steps are for using the Airflow service provided with each Cloudera Data Engineering Virtual Cluster. For information about using your own Airflow deployment, see Using Cloudera Data Engineering with an external Apache Airflow deployment.

To determine the Cloudera Data Warehouse hostname to use for the connection, perform the following steps:

  1. In the Cloudera Data Platform (CDP) management console, click the Data Warehouse tile and click Overview.
  2. In the Virtual Warehouses column, locate the Hive or Impala warehouse you want to connect to.
  3. Click next to the selected Warehouse, and then click Copy JDBC URL.
  4. Paste the URL into a text editor, and make note of the hostname and the httpPath.
    For example,
    jdbc:hive2://hs2-aws-2-hive.env-k5ip0r.dw.ylcu-atmi.cloudera.site/default;transportMode=http;httpPath=cliservice;ssl=true;retries=3;
    In this JDBC URL, the hostname is hs2-aws-2-hive.env-k5ip0r.dw.ylcu-atmi.cloudera.site and the httpPath is cliservice.

To determine the Cloudera Data Hub hostname to use for the connection, perform the following steps:

  1. In the Cloudera Management Console, navigate to a suitable Cloudera Data Hub instance.
  2. Click the Endpoints tab.
  3. Select and copy the JDBC URL for the Warehouse.
  4. Paste the URL into a text editor, and make note of the hostname and the httpPath parameters. The httpPath parameter is different for each Data Hub.
    For example,
    jdbc:hive2://man-az-1-master0.man-azur.xcu2-8y8x.dev.cldr.work/;ssl=true;transportMode=http;httpPath=man-az-1/cdp-proxy-api/hive
    
    In this JDBC URL, the hostname is man-az-1-master0.man-azur.xcu2-8y8x.dev.cldr.work and the httpPath is man-az-1/cdp-proxy-api/hive.

To create a connection to an existing CDW virtual warehouse using the embedded Airflow UI, perform the following steps:

  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
  2. Click Administration in the left navigation menu and select the service containing the Virtual Cluster that you are using.
  3. In the Virtual Clusters column, click Cluster Details for the Virtual Cluster.
  4. Click AIRFLOW UI.
  5. From the Airflow UI, click the Connection link from the Admin menu.
  6. Click the plus sign to add a new record and fill the following fields:
    • Conn Id: Create a unique connection identifier. For example, cdw-hive-sql.
    • Conn Type: Select Impala, regardless of the Data Warehouse type whether it is Impala or Hive.
    • Host: Enter the hostname copied from the JDBC connection URL. Do not enter the full JDBC URL.
    • Schema: Enter the schema to be used. The default value is default.
    • Login/Password: Both Impala and Hive support JWT token based authentication if configured accordingly. If JWT token is used, make sure that you have left the Username and Password fields blank. If not, enter your workload username and password.
    • Port: 443.
    • Extra: Extra arguments must contain the following options:
      • If JWT token based authentication is used, then enter the following arguement:
        {"auth_mechanism": "JWT", "use_ssl": "True", "use_http_transport": "True", "http_path": "<***HTTPPATH_COPIED_FROM_THE_PREQUISITES_STEP***>", "jwt": "JWT_TOKEN"}
      • If the workload username and password are used, then enter the following arguement:
        {"auth_mechanism": "LDAP", "use_ssl": "True", "use_http_transport": "True", "http_path": "<***HTTPPATH_COPIED_FROM_THE_PREQUISITES_STEP***>"}
  7. Click Save.