Kudu Health Tests
Kudu Master Health
This is a Kudu service-level health test that checks that enough of the Masters in the cluster are healthy. The test returns "Concerning" health if the number of healthy Masters falls below a warning threshold, expressed as a percentage of the total number of Masters. The test returns "Bad" health if the number of healthy and "Concerning" Masters falls below a critical threshold, expressed as a percentage of the total number of Masters. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Masters, this test would return "Good" health if 95 or more Masters have good health. This test would return "Concerning" health if at least 90 Masters have either "Good" or "Concerning" health. If more than 10 Masters have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Masters. Check the status of the individual Masters for more information. This test can be configured using the Kudu Kudu service-wide monitoring setting.
Short Name: Master Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy Master Monitoring Thresholds | The health test thresholds of the overall Master health. The check returns "Concerning" health if the percentage of "Healthy" Masters falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Masters falls below the critical threshold. | KUDU_KUDU_MASTER_healthy_thresholds | critical:60.0, warning:80.1 | PERCENT |
Kudu Tablet Server Health
This is a Kudu service-level health test that checks that enough of the Tablet Servers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Tablet Servers falls below a warning threshold, expressed as a percentage of the total number of Tablet Servers. The test returns "Bad" health if the number of healthy and "Concerning" Tablet Servers falls below a critical threshold, expressed as a percentage of the total number of Tablet Servers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Tablet Servers, this test would return "Good" health if 95 or more Tablet Servers have good health. This test would return "Concerning" health if at least 90 Tablet Servers have either "Good" or "Concerning" health. If more than 10 Tablet Servers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Tablet Servers. Check the status of the individual Tablet Servers for more information. This test can be configured using the Kudu Kudu service-wide monitoring setting.
Short Name: Tablet Server Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy Tablet Server Monitoring Thresholds | The health test thresholds of the overall Tablet Server health. The check returns "Concerning" health if the percentage of "Healthy" Tablet Servers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Tablet Servers falls below the critical threshold. | KUDU_KUDU_TSERVER_healthy_thresholds | critical:50.0, warning:75.0 | PERCENT |