Kudu Health Tests

Kudu Master Health

This is a Kudu service-level health test that checks that enough of the Masters in the cluster are healthy. The test returns "Concerning" health if the number of healthy Masters falls below a warning threshold, expressed as a percentage of the total number of Masters. The test returns "Bad" health if the number of healthy and "Concerning" Masters falls below a critical threshold, expressed as a percentage of the total number of Masters. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Masters, this test would return "Good" health if 95 or more Masters have good health. This test would return "Concerning" health if at least 90 Masters have either "Good" or "Concerning" health. If more than 10 Masters have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Masters. Check the status of the individual Masters for more information. This test can be configured using the Kudu Kudu service-wide monitoring setting.

Short Name: Master Health

Healthy Master Monitoring Thresholds

Description
The health test thresholds of the overall Master health. The check returns "Concerning" health if the percentage of "Healthy" Masters falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Masters falls below the critical threshold.
Template Name
KUDU_KUDU_MASTER_healthy_thresholds
Default Value
critical:60.0, warning:80.1
Unit(s)
PERCENT

Kudu Tablet Server Health

This is a Kudu service-level health test that checks that enough of the Tablet Servers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Tablet Servers falls below a warning threshold, expressed as a percentage of the total number of Tablet Servers. The test returns "Bad" health if the number of healthy and "Concerning" Tablet Servers falls below a critical threshold, expressed as a percentage of the total number of Tablet Servers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Tablet Servers, this test would return "Good" health if 95 or more Tablet Servers have good health. This test would return "Concerning" health if at least 90 Tablet Servers have either "Good" or "Concerning" health. If more than 10 Tablet Servers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Tablet Servers. Check the status of the individual Tablet Servers for more information. This test can be configured using the Kudu Kudu service-wide monitoring setting.

Short Name: Tablet Server Health

Healthy Tablet Server Monitoring Thresholds

Description
The health test thresholds of the overall Tablet Server health. The check returns "Concerning" health if the percentage of "Healthy" Tablet Servers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Tablet Servers falls below the critical threshold.
Template Name
KUDU_KUDU_TSERVER_healthy_thresholds
Default Value
critical:50.0, warning:75.0
Unit(s)
PERCENT