Upgrade considerations if Cloudera Data Engineering services include Apache Airflow workloads

If your Cloudera Data Engineering version and services are eligible for the in-place upgrade, consider these Apache Airflow-related instructions before performing the in-place upgrade.

  • Airflow jobs included in the Cloudera Data Engineering service
    • Set the catchup option of every Airflow job to false before starting the in-place upgrade. If you do not set the catchup option to false, and if the in-place upgrade fails, you might want to do a manual Cloudera Data Engineering service recovery through the backup. After the backup is restored, any Directed Acyclic Graph (DAG) whose catchup is not set to false might replay its entire run history from the defined DAG start date, which is an undesired behavior in most cases.
    • Set the catchup option to false for every DAG as described in the DAG runs official Airflow documentation.
  • Airflow Libraries and Operators included in the Cloudera Data Engineering service
    If you use Airflow Libraries and Operators, before starting the in-place upgrade, delete any Airflow Libraries and Operators in Cloudera Data Engineering.
    • If the Airflow Libraries and Operators are in the first step (Configure Repositories) of the configuration process, cancel them.
    • If the Airflow Libraries and Operators have already been built and are only waiting for activation, finish the activation process and delete them.
  • Airflow Variables and Connections included in the Cloudera Data Engineering service
    If you use Airflow Variables and Connections in the Cloudera Data Engineering service, the default backup taken before the upgrade does not include Airflow Variables and Connections.
    • If your in-place upgrade is successful, your Variables and Connections are kept.
  • Airflow and Airflow-Python version change after in-place upgrade
    For information on the runtime components versions, see Compatibility for Cloudera Data Engineering and Runtime components.