YARN Log Aggregation Overview
The YARN Log Aggregation feature enables you to move local log files of any application onto HDFS or a cloud-based storage depending on your cluster configuration.
Application logs has great significance: meaningful information can be extracted from them, can be used to debug issues or can be kept for historical analysis. YARN can move local logs securely onto HDFS or a cloud-based storage, such as AWS. This allows the logs to be stored for a much longer time than they could be on a local disk, allows faster search for a particular log file and optionally can handle compression.
The Log Aggregation Retention Period is set using the
yarn.log-aggregation.retain-seconds
property. The default value of Java Heap
Size (JHS) of JobHistory Server is 1GB. Cloudera recommends to use approximately 10KB per job of
JHS Heap memory. For example, 50,000 jobs uses at least 0.5GB of memory. Ensure that your JHS
heap is large enough to cache all of your jobs.
yarn.log-aggregation.retain-seconds
property increases the
number of jobs. For example, if yarn.log-aggregation.retain-seconds
is set to
180 days, and there are 3000 jobs daily and each job requires 10kB. The
heap size is 180*3000*10 = 5.4 GB.