Using Cloudera Data Engineering with an external Apache Airflow deployment
The Cloudera provider for Apache Airflow, available at the Cloudera GitHub repository, provides two Airflow operators for running Cloudera Data Engineering (CDE) and Cloudera Data Warehouse (CDW) jobs. You can install the provider on your existing Apache Airflow deployment to integrate.
- The Cloudera provider for Apache Airflow is for use with existing Airflow deployments. If you want to use the embedded Airflow service provided by CDE, see Apache Airflow in Cloudera Data Engineering.
- The provider requires Python 3.6 or higher.
- The provider requires the Python
cryptography
package version 3.3.2 or higher to address CVE-2020-36242. If an older version is installed, the plugin automatically updates the cryptography library.
This component provides two Airflow operators to be integrated in your DAGs:
- CdeRunJobOperator, for running Cloudera Data Engineering jobs.
- CDWOperator, for accessing Cloudera Data Warehouse
Install Cloudera Airflow provider on your Airflow servers
Create a connection using the Airflow UI
Before you can run a CDE job from your Airflow deployment, you must configure a connection using the Airflow UI.