HBase Health Tests

HBase Master Health

This is an HBase service-level health test that checks for an active, healthy HBase Master. The test returns "Bad" health if the service is running and an active Master cannot be found. If an active Master is found, then the test checks the health of that Master as well as the health of any standby Masters configured. A "Good" health result will only be returned if the active Master is healthy and all running standby Masters are healthy. A failure of this health test may indicate stopped or unhealthy Master roles, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the HBase service. Check the status of the HBase service's Master roles and look in the Cloudera Manager Service Monitor's log files for more information when this test fails. This test can be enabled or disabled using the Active Master Health Test HBase service-wide monitoring setting. The check for a healthy standby Master can be enabled or disabled with Backup Masters Health Test. In addition, the HBase Active Master Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active HBase Master before this health test fails.

Short Name: HBase Master Health

Active Master Health Test

Description
When computing the overall HBase cluster health, consider the active HBase Master's health.
Template Name
hbase_master_health_enabled
Default Value
true
Unit(s)
no unit

Backup Masters Health Test

Description
When computing the overall HBase cluster health, consider the health of the backup HBase Masters.
Template Name
hbase_backup_masters_health_enabled
Default Value
true
Unit(s)
no unit

HBase Active Master Detection Window

Description
The tolerance window that will be used in HBase service tests that depend on detection of the active HBase Master.
Template Name
hbase_active_master_detection_window
Default Value
3
Unit(s)
MINUTES

HBase RegionServer Health

This is a HBase service-level health test that checks that enough of the RegionServers in the cluster are healthy. The test returns "Concerning" health if the number of healthy RegionServers falls below a warning threshold, expressed as a percentage of the total number of RegionServers. The test returns "Bad" health if the number of healthy and "Concerning" RegionServers falls below a critical threshold, expressed as a percentage of the total number of RegionServers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 RegionServers, this test would return "Good" health if 95 or more RegionServers have good health. This test would return "Concerning" health if at least 90 RegionServers have either "Good" or "Concerning" health. If more than 10 RegionServers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy RegionServers. Check the status of the individual RegionServers for more information. This test can be configured using the HBase HBase service-wide monitoring setting.

Short Name: RegionServer Health

Healthy RegionServer Monitoring Thresholds

Description
The health test thresholds of the overall RegionServer health. The check returns "Concerning" health if the percentage of "Healthy" RegionServers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" RegionServers falls below the critical threshold.
Template Name
hbase_regionservers_healthy_thresholds
Default Value
critical:90.0, warning:95.0
Unit(s)
PERCENT