Apache Airflow in Cloudera Data Engineering
Learn about how Apache Airflow is integrated with Cloudera Data Engineering and how to automate a workflow or data pipeline using Apache Airflow Python DAG files in Cloudera Data Engineering.
Cloudera Data Engineering (CDE) enables you to automate a workflow or data pipeline using Apache Airflow Python DAG files. Each Cloudera Data Engineering Virtual Cluster includes an embedded instance of Apache Airflow. You can also use Cloudera Data Engineering with your own Airflow deployment. For more information about using your own Cloudera Data Engineering Airflow deployment, see Using Cloudera Data Engineering with an external Apache Airflow deployment.
Cloudera Data Engineering currently supports multiple Airflow operators. For example, one for running Cloudera Data Engineering jobs, or one for accessing and executing SQL commands on Cloudera Data Warehouse. For more information about the complete list of installed and supported operators, see Supported Airflow operators and hooks.
You can create and manage Apache Airflow jobs by writing or creating Python DAG files and uploading them using the UI. For more information about Cloudera Data Engineering Airflow job management, see Creating and managing Cloudera Data Engineering Airflow Jobs using the Cloudera Data Engineering UI.
- Creating and managing Airflow connections: To connect to a Cloudera Data Warehouse or Cloudera Data Engineering Virtual Cluster, you must create an Airflow connection.
- Executing SQL queries on Cloudera Data Warehouse: To execute SQL queries on Hive or Impala Virtual Warehouses, you can use the installed SQLExecuteQueryOperator or the CdwExecuteQueryOperator.
- Running jobs on other Cloudera Data Engineering Virtual Clusters: With an existing Cloudera Data Engineering connection, you can use the CDERunJobOperator to execute jobs on other Cloudera Data Engineering Virtual clusters.
You can also install and use custom operators and libraries (Python packages) for Airflow with Cloudera Data Engineering. Cloudera provides a way to extend the installed default packages with the third party or custom Python packages using the Custom Operators and Libraries feature using the Cloudera Data Engineering user interface (UI).