This service-level alert is triggered if the configured percentage of Region Server
processes cannot be determined to be up and listening on the network for the configured
critical threshold. The default setting is 10% to produce a WARN alert and 30% to produce
a CRITICAL alert. It uses the check_aggregate
plug-in to aggregate the
results of RegionServer process down
checks.
Misconfiguration or less-than-ideal configuration caused the RegionServers to crash
Cascading failures brought on by some workload caused the RegionServers to crash
The RegionServers shut themselves own because there were problems in the dependent services, ZooKeeper or HDFS
GC paused the RegionServer for too long and the RegionServers lost contact with Zookeeper
Check the dependent services to make sure they are operating correctly.
Look at the RegionServer log files (usually
/var/log/hbase/*.log)
for further informationLook at the configuration files (
/etc/hbase/conf
)If the failure was associated with a particular workload, try to understand the workload better
Restart the RegionServers