Managing Airflow log retention

Currently, automatic log retention policies may not apply to Airflow jobs as the logs are stored locally in a Kubernetes (K8s) volume rather than in a cloud object store. Over time, these log files can consume significant disk space based on the workload.

To resolve this, create a custom airflow-log-cleaner job that runs on a daily schedule to automatically delete logs older than a specified number of days.

Create an Airflow DAG file in Python. In this DAG, the configured days for log retention is 30. It means logs older than 30 days are cleaned up. Change the value of the days variable in the script from 30 to another value, if a different log retention period is needed.
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils import timezone
from datetime import timedelta
from dateutil import parser

# configure the number of days here
days = 30
log_clean_command = f'''
    find /usr/local/airflow/logs -type d -name 'lost+found' -prune -o -type f -mtime +{days} -name '*.log' -print0 | xargs -0 rm -f || true
    find /usr/local/airflow/logs -type d -empty -delete || true
    '''
dag = DAG(
    dag_id='airflow-log-cleaner',
    start_date=parser.isoparse('2026-04-08T04:11:30Z').replace(tzinfo=timezone.utc),
    schedule="@daily",
    catchup=False,
    is_paused_upon_creation=False,
)

shell_1 = BashOperator(
    bash_command=log_clean_command,
    task_id='shell_1',
    dag=dag,
)
  1. In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
  2. In the left navigation menu, click Jobs. The Jobs page is displayed.
  3. Click Create Job. The Job Details page is displayed.
  4. Provide the Job Details:
    1. Select Airflow for the job type. The available fields on the user interface updates automatically.
    2. Specify the Name as airflow-log-cleaner.
    3. Click the Resource as DAG File.
    4. Click Upload and select the DAG file you created in Before you begin section.
    5. In the Resource Name field, enter a name for the resource and click Upload.
  5. Click Create and Run to run the job immediately.