Monitoring MapReduce Jobs

A MapReduce job is a unit of processing (query or transformation) on the data stored within a Hadoop cluster. You can view information about the different jobs that have run in your cluster during a selected time span.

  • The list of jobs provides specific metrics about the jobs that were submitted, were running, or finished within the time frame you select.
  • You can select charts that show a variety of metrics of interest, either for the cluster as a whole or for individual jobs.

You can use the Time Range Selector or a duration link ( ) to set the time range. (See Time Line for details).

You can select an activity and drill down to look at the jobs and tasks spawned by that job:

  • View the children (MapReduce jobs) of a Pig or Hive activity.
  • View the task attempts generated by a MapReduce job.
  • View the children (MapReduce, Pig, or Hive activities) of an Oozie job.
  • View the activity or job statistics in a detail report format.
  • Compare the selected activity to a set of other similar activities, to determine if the selected activity showed anomalous behavior. For example, if a standard job suddenly runs much longer than usual, this may indicate issues with your cluster.
  • Display the distribution of task attempts that made up a job, by different metrics compared to task duration. You can use this, for example, to determine if tasks running on a certain host are performing slower than average.
  • Kill a running job, if necessary.