5.3.2. Percent NodeManagers live

This alert is triggered if the number of down NodeManagers in the cluster is greater than the configured critical threshold. It uses the check_aggregate plugin to aggregate the results of DataNode process alert checks.

5.3.2.1. Potential causes

NodeManagers are down.
NodeManagers are not down but are not listening to the correct network port/address .
Nagios server cannot connect to one or more NodeManagers.

5.3.2.2. Possible remedies

Check for dead NodeManagers.
Check for any errors in the NodeManager logs (/var/log/hadoop/yarn) and restart the NodeManagers hosts/processes, as necessary.
Run the netstat-tuplpn command to check if the NodeManager process is bound to the correct network port.
Use ping to check the network connection between the Nagios Server and the NodeManagers host.

Legal notices