3.2. NameNode HA Alerts

Alert	Description	Potential Causes	Possible Remedies
JournalNode process	This host-level alert is triggered if the individual JournalNode process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds.	The JournalNode process is down or not responding. The JournalNode is not down but is not listening to the correct network port/address.	Check if the JournalNode process is dead.
NameNode High Availability Health	This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running.	The Active, Standby or both NameNode processes are down.	On each host running NameNode, check for any errors in the logs (/var/log/hadoop/hdfs/) and restart the NameNode host/process using Ambari Web. On each host running NameNode, run the netstat-tuplpn command to check if the NameNode process is bound to the correct network port.
ZooKeeper Failover Controller process	This alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network.	The ZKFC process is down or not responding.	Check if the ZKFC process is running.

Alert

Description

Potential Causes

Possible Remedies

JournalNode process

This host-level alert is triggered if the individual JournalNode process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds.

The JournalNode process is down or not responding.

The JournalNode is not down but is not listening to the correct network port/address.

Check if the JournalNode process is dead.

NameNode High Availability Health

This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running.

The Active, Standby or both NameNode processes are down.

On each host running NameNode, check for any errors in the logs (/var/log/hadoop/hdfs/) and restart the NameNode host/process using Ambari Web.

On each host running NameNode, run the netstat-tuplpn command to check if the NameNode process is bound to the correct network port.

ZooKeeper Failover Controller process

This alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network.

The ZKFC process is down or not responding.

Check if the ZKFC process is running.

Legal notices