Flume Health Tests

Flume Agent Health

This is a Flume service-level health test that checks that enough of the Agents in the cluster are healthy. The test returns "Concerning" health if the number of healthy Agents falls below a warning threshold, expressed as a percentage of the total number of Agents. The test returns "Bad" health if the number of healthy and "Concerning" Agents falls below a critical threshold, expressed as a percentage of the total number of Agents. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Agents, this test would return "Good" health if 95 or more Agents have good health. This test would return "Concerning" health if at least 90 Agents have either "Good" or "Concerning" health. If more than 10 Agents have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Agents. Check the status of the individual Agents for more information. This test can be configured using the Flume Flume service-wide monitoring setting.

Short Name: Agent Health

Property Name Description Template Name Default Value Unit
Healthy Agent Monitoring Thresholds The health test thresholds of the overall Agent health. The check returns "Concerning" health if the percentage of "Healthy" Agents falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Agents falls below the critical threshold. flume_agents_healthy_thresholds critical:never, warning:95.0 PERCENT