Viewing application logs

The MapReduce Application ACL mapreduce.job.acl-view-job determines whether or not you can view an application log.

The MapReduce Application ACL mapreduce.job.acl-view-job determines whether or not you can view an application log, and access is evaluated via the following ACLs:
  • YARN Admin and Queue ACLs
  • Application ACLs

After an application is in the “finished” state, logs are aggregated, depending on your cluster setup. You can access the aggregated logs via the MapReduce History server web interface. Aggregated logs are stored on shared cluster storage, which in most cases is HDFS. You can also share log aggregation via storage options like S3 or Azure by modifying the yarn.nodemanager.remote-app-log-dir setting in Cloudera Manager to point to either S3 or Azure, which should already be configured.

The shared storage on which the logs are aggregated helps to prevent access to the log files via file level permissions. Permissions on the log files are also set at the file system level, and are enforced by the file system: the file system can block any user from accessing the file, which means that the user cannot open/read the file to check the ACLs that are contained within.

In the cluster storage use case of HDFS, you can only access logs that are aggregated via the:
  • Application owner
  • Group defined for the MapReduce History server

When an application runs, generates logs, and then places the logs into HDFS, a path/structure is generated (for example: /tmp/logs/john/logs/application_1536220066338_0001). So access for the application owner "john" might be set to 700, which means read, write, execute; no one else can view files underneath this directory. If you don’t have HDFS access, you will be denied access. Command line users identified in mapreduce.job.acl-view-job are also denied access at the file level. In such a use case, the Application ACLs stored inside the aggregated logs will never be evaluated because the Application ACLs do not have file access.

For clusters that do not have log aggregation, logs for running applications are kept on the node where the container runs. You can access these logs via the Resource Manager and Node Manager web interface, which performs the ACL checks.