Configuring YARN rolling log aggregation

Learn how to enable and configure rolling log aggregation for Cloudera Streaming Analytics Flink jobs running on YARN.

Overview

Cloudera Streaming Analytics 1.15.2 introduces support for rolling log aggregation in YARN environments. Rolling log aggregation enables more efficient log management for Flink jobs by aggregating container logs incrementally rather than waiting for container completion. This improves log availability during job execution and reduces storage pressure on the cluster.

Enabling rolling log aggregation

Rolling log aggregation is controlled through YARN and Flink configuration. Ensure that your YARN cluster has log aggregation enabled, then configure the following for your Flink on YARN deployment:

YARN log aggregation: Enable YARN log aggregation at the cluster level if not already enabled. This is typically configured in yarn-site.xml.
Flink YARN configuration: When submitting Flink jobs to YARN, the rolling log aggregation behavior is inherited from the YARN application master and task manager containers.

Configuration

Configure rolling log aggregation through YARN container log aggregation settings. Key configuration properties include:

Log aggregation roll interval and retention settings in your YARN configuration
Container log directory and aggregation policies