ECS Health Tests

You can use health tests to verify the health of various ECS components. If you are experiencing issues, these tests can help diagnose and solve the problem.

ECS Control Plane Health

Check Control Plane health status.

Short Name: Control Plane Health

ECS Ecs Agent Health

This is a ECS service-level health test that checks that enough of the Ecs Agents in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ecs Agents falls below a warning threshold, expressed as a percentage of the total number of Ecs Agents. The test returns "Bad" health if the number of healthy and "Concerning" Ecs Agents falls below a critical threshold, expressed as a percentage of the total number of Ecs Agents. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ecs Agents, this test would return "Good" health if 95 or more Ecs Agents have good health. This test would return "Concerning" health if at least 90 Ecs Agents have either "Good" or "Concerning" health. If more than 10 Ecs Agents have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ecs Agents. Check the status of the individual Ecs Agents for more information. This test can be configured using the ECS ECS service-wide monitoring setting.

Short Name: Ecs Agent Health

Healthy Ecs Agent Monitoring Thresholds

Description
The health test thresholds of the overall Ecs Agent health. The check returns "Concerning" health if the percentage of "Healthy" Ecs Agents falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ecs Agents falls below the critical threshold.
Template Name
ecs_ecs_agents_healthy_thresholds
Default Value
critical:90.0, warning:95.0
Unit(s)
PERCENT

ECS Ecs Server Health

This is a ECS service-level health test that checks that enough of the Ecs Servers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ecs Servers falls below a warning threshold, expressed as a percentage of the total number of Ecs Servers. The test returns "Bad" health if the number of healthy and "Concerning" Ecs Servers falls below a critical threshold, expressed as a percentage of the total number of Ecs Servers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ecs Servers, this test would return "Good" health if 95 or more Ecs Servers have good health. This test would return "Concerning" health if at least 90 Ecs Servers have either "Good" or "Concerning" health. If more than 10 Ecs Servers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ecs Servers. Check the status of the individual Ecs Servers for more information. This test can be configured using the ECS ECS service-wide monitoring setting.

Short Name: Ecs Server Health

Healthy Ecs Server Monitoring Thresholds

Description
The health test thresholds of the overall Ecs Server health. The check returns "Concerning" health if the percentage of "Healthy" Ecs Servers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ecs Servers falls below the critical threshold.
Template Name
ecs_ecs_servers_healthy_thresholds
Default Value
critical:90.0, warning:95.0
Unit(s)
PERCENT

ECS Infrastructure Prometheus Health

Check Infra Prometheus health status.

Short Name: Infrastructure Prometheus Health

ECS Kubernetes Health

Check Kubernetes Components for health status.

Short Name: Kubernetes Health

ECS Longhorn Health

Check Longhorn health status.

Short Name: Longhorn Health

ECS Vault Health

Check Vault health status.

Short Name: Vault Health