5.1.9. Percent DataNodes live

This alert is triggered if the number of down DataNodes in the cluster is greater than the configured critical threshold. It uses the check_aggregate plug-in to aggregate the results of Data node process checks.

 5.1.9.1. Potential causes
  • DataNodes are down

  • DataNodes are not down but are not listening to the correct network port/address

  • Nagios server cannot connect to one or more DataNodes

 5.1.9.2. Possible remedies
  • Check for dead DataNodes in Ambari Web.

  • Check for any errors in the DataNode logs (/var/log/hadoop/hdfs) and restart the DataNode hosts/processes

  • Run the netstat-tuplpn command to check if the DataNode process is bound to the correct network port.

  • Use ping to check the network connection between the Nagios server and the DataNodes.


loading table of contents...