ECS Health Tests
ECS Control Plane Health
Check Control Plane health status.
Short Name: Control Plane Health
ECS Ecs Agent Health
This is a ECS service-level health test that checks that enough of the Ecs Agents in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ecs Agents falls below a warning threshold, expressed as a percentage of the total number of Ecs Agents. The test returns "Bad" health if the number of healthy and "Concerning" Ecs Agents falls below a critical threshold, expressed as a percentage of the total number of Ecs Agents. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ecs Agents, this test would return "Good" health if 95 or more Ecs Agents have good health. This test would return "Concerning" health if at least 90 Ecs Agents have either "Good" or "Concerning" health. If more than 10 Ecs Agents have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ecs Agents. Check the status of the individual Ecs Agents for more information. This test can be configured using the ECS ECS service-wide monitoring setting.
Short Name: Ecs Agent Health
Healthy Ecs Agent Monitoring Thresholds
- Description
- The health test thresholds of the overall Ecs Agent health. The check returns "Concerning" health if the percentage of "Healthy" Ecs Agents falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ecs Agents falls below the critical threshold.
- Template Name
- ecs_ecs_agents_healthy_thresholds
- Default Value
- critical:90.0, warning:95.0
- Unit(s)
- PERCENT
ECS Ecs Server Health
This is a ECS service-level health test that checks that enough of the Ecs Servers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ecs Servers falls below a warning threshold, expressed as a percentage of the total number of Ecs Servers. The test returns "Bad" health if the number of healthy and "Concerning" Ecs Servers falls below a critical threshold, expressed as a percentage of the total number of Ecs Servers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ecs Servers, this test would return "Good" health if 95 or more Ecs Servers have good health. This test would return "Concerning" health if at least 90 Ecs Servers have either "Good" or "Concerning" health. If more than 10 Ecs Servers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ecs Servers. Check the status of the individual Ecs Servers for more information. This test can be configured using the ECS ECS service-wide monitoring setting.
Short Name: Ecs Server Health
Healthy Ecs Server Monitoring Thresholds
- Description
- The health test thresholds of the overall Ecs Server health. The check returns "Concerning" health if the percentage of "Healthy" Ecs Servers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ecs Servers falls below the critical threshold.
- Template Name
- ecs_ecs_servers_healthy_thresholds
- Default Value
- critical:90.0, warning:95.0
- Unit(s)
- PERCENT
ECS Infrastructure Prometheus Health
Check Infra Prometheus health status.
Short Name: Infrastructure Prometheus Health
ECS Kubernetes Health
Check Kubernetes Components for health status.
Short Name: Kubernetes Health
ECS Longhorn Health
Check Longhorn health status.
Short Name: Longhorn Health
ECS Vault Health
Check Vault health status.
Short Name: Vault Health