Managing Airflow log retention
Currently, automatic log retention policies may not apply to Airflow jobs as the logs are stored locally in a Kubernetes (K8s) volume rather than in a cloud object store. Over time, these log files can consume significant disk space based on the workload.
To resolve this, create a custom airflow-log-cleaner job that runs on a daily schedule to automatically delete logs older than a specified number of days.
days variable in the script from 30 to
another value, if a different log retention period is
needed.from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils import timezone
from datetime import timedelta
from dateutil import parser
# configure the number of days here
days = 30
log_clean_command = f'''
find /usr/local/airflow/logs -type d -name 'lost+found' -prune -o -type f -mtime +{days} -name '*.log' -print0 | xargs -0 rm -f || true
find /usr/local/airflow/logs -type d -empty -delete || true
'''
dag = DAG(
dag_id='airflow-log-cleaner',
start_date=parser.isoparse('2026-04-08T04:11:30Z').replace(tzinfo=timezone.utc),
schedule="@daily",
catchup=False,
is_paused_upon_creation=False,
)
shell_1 = BashOperator(
bash_command=log_clean_command,
task_id='shell_1',
dag=dag,
)
- In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
- In the left navigation menu, click Jobs. The Jobs page is displayed.
- Click Create Job. The Job Details page is displayed.
-
Provide the Job Details:
- Select Airflow for the job type. The available fields on the user interface updates automatically.
- Specify the Name as airflow-log-cleaner.
- Click the Resource as DAG File.
- Click Upload and select the DAG file you created in Before you begin section.
- In the Resource Name field, enter a name for the resource and click Upload.
- Click Create and Run to run the job immediately.
