Monitoring MapReduce Jobs

A MapReduce job is a unit of processing (query or transformation) on the data stored within a Hadoop cluster. You can view information about the different jobs that have run in your cluster during a selected time span.

The list of jobs provides specific metrics about the jobs that were submitted, were running, or finished within the time frame you select. You can select charts that show a variety of metrics of interest, either for the cluster as a whole or for individual jobs.

You can use the Time Range Selector or a duration link ( ) to set the time range.

You can select an activity and drill down to look at the jobs and tasks spawned by that job:

  • View the children (MapReduce jobs) of a Pig or Hive activity.
  • View the task attempts generated by a MapReduce job.
  • View the children (MapReduce, Pig, or Hive activities) of an Oozie job.
  • View the activity or job statistics in a detail report format.
  • Compare the selected activity to a set of other similar activities, to determine if the selected activity showed anomalous behavior. For example, if a standard job suddenly runs much longer than usual, this may indicate issues with your cluster.
  • Display the distribution of task attempts that made up a job, by different metrics compared to task duration. You can use this, for example, to determine if tasks running on a certain host are performing slower than average.
  • Stop a running job, if necessary.