Executing SQL queries on Cloudera Data Warehouse or Cloudera Data Hub instance using Apache Airflow in Cloudera Data Engineering
The following steps are for using the Airflow service provided with each Cloudera Data Engineering virtual cluster. For information about using your own Airflow deployment, see Using Cloudera Data Engineering with an external Apache Airflow deployment.
To run this job, an existing Airflow connection is required. For more information
                about how to create an airflow connection, see the following topics: 
        - For SQL Operators, see Creating a connection to Cloudera Data Warehouse or Cloudera Data Hub instance for SQL Operator.
 - For Cloudera Data Warehouse Operators, see Creating a connection to Cloudera Data Warehouse for Cloudera Data Warehouse Operator.
 
                Create an Airflow DAG file in Python. Import the required operators and define
                    the tasks and dependencies. 
                
                    
            
-  The following example DAG file uses the connection named
                                “impala-test”, and executes a “SHOW
                                TABLES” query on the Impala Virtual Warehouse using the
                                SQLExecuteQueryOperator:
                                
import logging from airflow import DAG from datetime import datetime, timedelta from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator with DAG( dag_id="imp_dag", start_date=datetime(2024, 2, 9), schedule_interval=timedelta(days=1), catchup=False, ) as dag: execute_query = SQLExecuteQueryOperator( task_id="execute_query", conn_id="impala-test", sql=f"SHOW TABLES", split_statements=True, return_last=False, ) execute_queryFor more advanced use cases, see Airflow documentation about the SQL Operators.
 - The following example DAG file uses the connection named
                                “cdw-hive”, and executes a “SHOW
                                TABLES” query on the Impala Virtual Warehouse using the
                                CdwExecuteQueryOperator:
                                
from airflow import DAG from cloudera.airflow.providers.operators.cdw import CdwExecuteQueryOperator from pendulum import datetime default_args = { 'owner': 'dag_owner', 'depends_on_past': False, 'start_date':datetime(2024, 2, 9) } example_dag = DAG( 'example-cdwoperator', default_args=default_args, schedule_interval=None, catchup=False, is_paused_upon_creation=False ) cdw_query = """ USE default; SHOW TABLES; """ cdw_step = CdwExecuteQueryOperator( task_id='cdw-test', dag=example_dag, cli_conn_id='cdw-hive', hql=cdw_query, # The following values `schema`, `query_isolation` # are the default values, just presented here for the example. schema='default', query_isolation=True ) cdw_stepFor more information about the CdwExecuteQueryOperator, see the GitHub page.
 
