Managing and Monitoring a Cluster
Also available as:
PDF
loading table of contents...

MapReduce2 alerts

Descriptions, potential causes and possible rememdies for alerts triggered by MapReduce2.

Table 1. MapReduce2 Alerts
Alert Alert Type Description Potential Causes Possible Remedies
History Server Web UI WEB This host-level alert is triggered if the HistoryServer Web UI is unreachable. The HistoryServer process is not running. Check if the HistoryServer process is running.
History Server RPC latency METRIC This host-level alert is triggered if the HistoryServer operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations. A job or an application is performing too many HistoryServer operations. Review the job or the application for potential bugs causing it to perform too many HistoryServer operations.
History Server CPU Utilization METRIC This host-level alert is triggered if the percent of CPU utilization on the HistoryServer exceeds the configured critical threshold. Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon.

Use the top command to determine which processes are consuming excess CPU.

Reset the offending process.

History Server Process PORT This host-level alert is triggered if the HistoryServer process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds.

HistoryServer process is down or not responding.

HistoryServer is not down but is not listening to the correct network port/address.

Check the HistoryServer is running.

Check for any errors in the HistoryServer logs /var/log/hadoop/mapred and restart the HistoryServer, if necessary.