5.4.2. JobTracker RPC latency alert

This alert is triggered if the JobTracker operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for JobTracker operations. This alert uses the Nagios check_rpcq_latency plugin. Potential causes
  • High load on the JobTracker in terms of the number of tasks being scheduled and completed. For example, a large number of very short-running tasks which cause extreme load on the JobTracker could cause this. Using CapacityScheduler should usually prevent this from occurring. Possible remedies
  • Check the running jobs using bin/hadoop job -list or on the JobTracker UI to find the offending job(s) running very large number of short-running tasks.

  • If necessary, abort the offending job(s) via bin/hadoop job -kill [jobId]

loading table of contents...