Adding custom operators and libraries

You can add custom python packages for Airflow with Cloudera Data Engineering (CDE). Cloudera provides access to the open source packages that you can use for your Airflow jobs using the UI.

While you can install the operator, if additional runtime dependencies are required such as additional setup with binaries on the path, and environment configuration like Kerberos and Cloud credentials, and so on, then the operator will not work. To use an Airflow third-party operator for your custom library and operator package, you must configure the Airflow connection in the Airflow UI.
  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
  2. Click Administration in the left navigation menu. The Administration page displays.
  3. Locate the Virtual Cluster that you want to edit, and click Cluster Details.
  4. Go to the Airflow tab. The Libraries and Operators page displays.
  5. Under the Configure Repositories section, enter the following fields to configure the Python Package Index (PyPi) repositories used to source your custom libraries and operators:
    1. PyPI Repository URL - Enter the Python Package Index (PyPi) URL.
    2. Optional: SSL Certificate - Enter the PEM-encoded CA certificate.
    3. Optional: Enter Authorization Credentials if you are configuring a Private or Protected PyPi Repository that requires authorization for access:
      1. Username
      2. Password
  6. Click Validate Configurations.
  7. Under the Build section, upload a requirements.txt file that contains a list of all library and operator packages that you want to enable. Once uploaded, the system will automatically build and install your packages.
  8. Click Activate. The activation restarts the Airflow server. This may take a few minutes. Once activation is complete, you will see the Installed Packages listed.
    You can now create and run an Airflow job using the custom library and operators that you have acitvated.