Viewing application logs
The MapReduce Application ACL mapreduce.job.acl-view-job
determines
whether or not you can view an application log.
- YARN Admin and Queue ACLs
- Application ACLs
After an application is in the “finished” state, logs are aggregated,
depending on your cluster setup. You can access the aggregated logs via the MapReduce History
server web interface. Aggregated logs are stored on shared cluster storage, which in most cases
is HDFS. You can also share log aggregation via storage options like S3 or Azure by modifying the
yarn.nodemanager.remote-app-log-dir
setting in
Cloudera Manager to point to either S3 or Azure, which should already be configured.
The shared storage on which the logs are aggregated helps to prevent access to the log files via file level permissions. Permissions on the log files are also set at the file system level, and are enforced by the file system: the file system can block any user from accessing the file, which means that the user cannot open/read the file to check the ACLs that are contained within.
- Application owner
- Group defined for the MapReduce History server
When an application runs, generates logs, and then places the logs into
HDFS, a path/structure is generated (for example: /tmp/logs/john/logs/application_1536220066338_0001
). So access for the application
owner "john" might be set to 700
, which means
read
, write
, execute
; no one else can view
files underneath this directory. If you don’t have HDFS access, you will be denied access.
Command line users identified in mapreduce.job.acl-view-job
are also denied access at the file level. In such a use
case, the Application ACLs stored inside the aggregated logs will never be evaluated because the
Application ACLs do not have file access.
For clusters that do not have log aggregation, logs for running applications are kept on the node where the container runs. You can access these logs via the Resource Manager and Node Manager web interface, which performs the ACL checks.