Job Submission and Monitoring
Job is the primary interface by which a user job interacts with the ResourceManager. Job provides facilities to submit jobs, track their progress, access component task reports and logs, and obtain MapReduce cluster status information.
The job submission process includes:
- Checking the input and output specifications of the job.
- Computing the InputSplit values for the job.
- Setting up the requisite accounting information for theDistributedCache of the job, if necessary.
- Copying the job's JAR file and configuration to the MapReduce system directory on the filesystem.
- Submitting the job to the ResourceManager and optionally monitoring its status.
Job history files are also logged to user-specified directories mapreduce.jobhistory.intermediate-done-dir and mapreduce.jobhistory.done-dir.
$ hadoop job -history output.jhist
$ hadoop job -history all output.jhist
You can use OutputLogFilter to filter log files from the output directory listing.
Normally, you create the application, describe various facets of the job, submit the job, and then monitor its progress.
Job Control
You might need to chain MapReduce jobs to accomplish complex tasks that cannot be done with a single job. This is fairly easy, because output of the job typically goes to the distributed filesystem, and that output can be used as input for the next job.
However, clients must ensure that jobs are complete (success/failure). Job control options are:
- Job.jobSubmit(): Submit the job to the cluster and return immediately.
- Job.waitForCompletion(boolean): Submit the job to the cluster and wait for it to finish.
You can also use Oozie to implement chains of MapReduce jobs.