YARN Log Aggregation Overview

The YARN Log Aggregation feature enables you to move local log files of any application onto HDFS or a cloud-based storage depending on your cluster configuration.

Application logs have great significance: meaningful information can be extracted from them, can be used to debug issues or can be kept for historical analysis. YARN can move local logs securely onto HDFS or a cloud-based storage, such as AWS. This allows the logs to be stored for a much longer time than they could be on a local disk, allows faster search for a particular log file and optionally can handle compression.

There are two main types of log aggregation:
  • Basic log aggregation: Aggregates the log once the container finishes. Also known as log aggregation without type specification.
  • Rolling log aggregation: Aggregates logs at set time intervals. This time is given in seconds and is configurable by the user. The primary use case of rolling log aggregation are long-running applications like Spark streaming jobs.