5.1.1. Host down

This alert is configured for all nodes in the Hadoop cluster (Hadoop master and slave nodes) as well as the Nagios and Ganglia monitoring servers. By default, it uses the Nagios plugin check_ping to find the average round trip response (RTT) time and the packet loss percentage by pinging each cluster node.

This alert helps Ambari Web determine the number of cluster nodes that are up and down at a given time. A network outage may also result in a host down alert.

[Note]Note

The hadoop-services.cfg file does not define this alert explicitly. Instead, this alert is defined as a part of the generic host definition in the templates.cfg file using the check-host-alive plugin.

 5.1.1.1. Possible causes
  • The host is actually down

  • There is a network outage and the Nagios server cannot access the host

 5.1.1.2. Possible remedies
  • Check the host and restart if necessary

  • Check network connections


loading table of contents...