Learn how to create an Airflow connection to an existing Cloudera Data Warehouse
before running the workloads using the CDW Operator.
The following
steps are for using the Airflow service provided with each CDE virtual cluster. For
information about using your own Airflow deployment, see
Using Cloudera Data Engineering with an external
Apache Airflow deployment.
To determine the Cloudera Data Warehouse hostname to use for the connection, perform
the following steps:
- In the Cloudera Data Platform (CDP) management console, click the
Data Warehouse tile and click
Overview.
- In the Virtual Warehouses column, locate the Hive or
Impala warehouse you want to connect to.
- Click next
to the selected Warehouse, and then click Copy JDBC
URL.
- Paste the URL into a text editor, and make note of the hostname.
For
example,
jdbc:hive2://hs2-aws-2-hive.env-k5ip0r.dw.ylcu-atmi.cloudera.site/default;transportMode=http;httpPath=cliservice;ssl=true;retries=3;
In
this JDBC URL, the hostname is
hs2-aws-2-hive.env-k5ip0r.dw.ylcu-atmi.cloudera.site.
To create a connection to an existing CDW virtual warehouse using the embedded
Airflow UI, perform the following steps:
-
In the Cloudera Data Platform (CDP) console, click the Data
Engineering tile. The CDE Home page displays.
-
Click Administration in the left navigation menu and
select the service containing the virtual cluster that you are using.
-
In the Virtual Clusters column, click Cluster
Details for the virtual cluster.
-
Click AIRFLOW UI.
-
From the Airflow UI, click the Connection link from the
Admin menu.
-
Click the plus sign to add a new record and fill the following fields:
- Conn Id: Create a unique connection identifier.
For example, cdw-hive-demo.
- Conn Type: Select Hive Client
Wrapper.
- Host: Enter the hostname copied from the JDBC
connection URL. Do not enter the full JDBC URL.
- Schema: Enter the schema to be used. The default
value is
default
.
- Login/Password: Enter your workload username and
password.
-
Click Save.