5.3.5. DataNode process down alert

This alert is triggered if the various individual DataNode processes cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. It uses the Nagios check_tcp plugin.

5.3.5.1. Potential causes

The DataNodes are down or not responding
The DataNodes are not down but are not listening to the correct network port/address
The Nagios server cannot connect to one or more DataNodes

5.3.5.2. Possible remedies

Check for dead DataNodes in the Services list.
Check for any errors in the DataNode logs (/var/log/hadoop/hdfs) and restart the DataNode, if necessary
Run the netstat-tuplpn command to check if the DataNode process is bound to the correct network port
Use ping to check the network connection between the Nagios server and the DataNode

Legal notices