Using CDE with an external Apache Airflow deployment
The Cloudera provider for Apache Airflow, available at the Cloudera GitHub repository, provides an Airflow operator for running Cloudera Data Engineering (CDE) jobs. You can install the provider on your existing Apache Airflow deployment to integrate.
- The Cloudera provider for Apache Airflow is for use with existing Airflow deployments. If you want to use the embedded Airflow service provided by CDE, see Automating data pipelines with CDE using Apache Airflow.
- The provider requires Python 3.6 or higher.
- The provider requires the Python
cryptography
package version 3.3.2 or higher to address CVE-2020-36242. If an older version is installed, the plugin automatically updates the cryptography library.
This component provides an Airflow Operator CDEJobRunOperator to be integrated in your DAGs. The CDEJobRunOperator is for running Cloudera Data Engineering jobs.
Install Cloudera Airflow provider on your Airflow servers
Create a connection using the Airflow UI
Before you can run a CDE job from your Airflow deployment, you must configure a connection using the Airflow UI.