Job summaries in
You can view and collect job summaries in the
The original Hadoop committer creates a zero byte
_SUCCESS file in the
root of the output directory unless disabled.
The manifest committer writes a JSON summary which includes:
- The name of the committer.
- Diagnostics information.
- A list of some of the files created (for testing; a full list is excluded as it can get big).
- IO Statistics.
If, after running a query, this
_SUCCESS file is zero bytes long, the
manifest committer has not been used.
If it is not empty, then it can be examined.
_SUCCESS file files through the
The summary files are JSON, and can be viewed in any text editor.
For a more succinct summary, including better display of statistics, use the
hadoop org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.ManifestPrinter <path>
This works for the files saved at the base of an output directory, and any reports saved to a report directory.
Collecting Job Summaries
The committer can be configured to save the
_SUCCESS summary files to a
report directory, irrespective of whether the job succeed or failed, by setting a fileystem
path in the option
The path does not have to be on the same store/filesystem as the destination of work. For example, a local fileystem could be used.
<property> <name>mapreduce.manifest.committer.summary.report.directory</name> <value>file:///tmp/reports</value> </property>
This allows for the statistics of jobs to be collected irrespective of their outcome,
whether or not saving the
_SUCCESS marker is enabled, and without problems
caused by a chain of queries overwriting the markers.