3.5. MapReduce2 Alerts
Alert |
Alert Type |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|---|
HistoryServer Web UI | WEB |
This host-level alert is triggered if the HistoryServer Web UI is unreachable. |
The HistoryServer process is not running. |
Check if the HistoryServer process is running. |
HistoryServer RPC latency | METRIC |
This host-level alert is triggered if the HistoryServer operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations. |
A job or an application is performing too many HistoryServer operations. |
Review the job or the application for potential bugs causing it to perform too many HistoryServer operations. |
HistoryServer CPU utilization | METRIC |
This host-level alert is triggered if the percent of CPU utilization on the HistoryServer exceeds the configured critical threshold. |
Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon. |
Use the top command to determine which processes are consuming excess CPU. Reset the offending process. |
HistoryServer Process | PORT |
This host-level alert is triggered if the HistoryServer process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. |
HistoryServer process is down or not responding. HistoryServer is not down but is not listening to the correct network port/address. |
Check the HistoryServer is running. Check for any errors in the HistoryServer logs (/var/log/hadoop/mapred) and restart the HistoryServer, if necessary. |