Ozone Health Tests

Ozone DataNode Health

This is a Ozone service-level health test that checks that enough of the Ozone DataNodes in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ozone DataNodes falls below a warning threshold, expressed as a percentage of the total number of Ozone DataNodes. The test returns "Bad" health if the number of healthy and "Concerning" Ozone DataNodes falls below a critical threshold, expressed as a percentage of the total number of Ozone DataNodes. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ozone DataNodes, this test would return "Good" health if 95 or more Ozone DataNodes have good health. This test would return "Concerning" health if at least 90 Ozone DataNodes have either "Good" or "Concerning" health. If more than 10 Ozone DataNodes have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ozone DataNodes. Check the status of the individual Ozone DataNodes for more information. This test can be configured using the Ozone Ozone service-wide monitoring setting.

Short Name: Ozone DataNode Health

Healthy Ozone DataNode Monitoring Thresholds

Description
The health test thresholds of the overall Ozone DataNode health. The check returns "Concerning" health if the percentage of "Healthy" Ozone DataNodes falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ozone DataNodes falls below the critical threshold.
Template Name
OZONE_OZONE_DATANODE_healthy_thresholds
Default Value
critical:90.0, warning:99.0
Unit(s)
PERCENT

Ozone Manager Health

This is a Ozone service-level health test that checks that enough of the Ozone Managers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Ozone Managers falls below a warning threshold, expressed as a percentage of the total number of Ozone Managers. The test returns "Bad" health if the number of healthy and "Concerning" Ozone Managers falls below a critical threshold, expressed as a percentage of the total number of Ozone Managers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Ozone Managers, this test would return "Good" health if 95 or more Ozone Managers have good health. This test would return "Concerning" health if at least 90 Ozone Managers have either "Good" or "Concerning" health. If more than 10 Ozone Managers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Ozone Managers. Check the status of the individual Ozone Managers for more information. This test can be configured using the Ozone Ozone service-wide monitoring setting.

Short Name: Ozone Manager Health

Healthy Ozone Manager Monitoring Thresholds

Description
The health test thresholds of the overall Ozone Manager health. The check returns "Concerning" health if the percentage of "Healthy" Ozone Managers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Ozone Managers falls below the critical threshold.
Template Name
OZONE_OZONE_MANAGER_healthy_thresholds
Default Value
critical:50.0, warning:75.0
Unit(s)
PERCENT

Ozone Storage Container Manager Health

This is a Ozone service-level health test that checks that enough of the Storage Container Managers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Storage Container Managers falls below a warning threshold, expressed as a percentage of the total number of Storage Container Managers. The test returns "Bad" health if the number of healthy and "Concerning" Storage Container Managers falls below a critical threshold, expressed as a percentage of the total number of Storage Container Managers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Storage Container Managers, this test would return "Good" health if 95 or more Storage Container Managers have good health. This test would return "Concerning" health if at least 90 Storage Container Managers have either "Good" or "Concerning" health. If more than 10 Storage Container Managers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Storage Container Managers. Check the status of the individual Storage Container Managers for more information. This test can be configured using the Ozone Ozone service-wide monitoring setting.

Short Name: Storage Container Manager Health

Healthy Storage Container Manager Monitoring Thresholds

Description
The health test thresholds of the overall Storage Container Manager health. The check returns "Concerning" health if the percentage of "Healthy" Storage Container Managers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Storage Container Managers falls below the critical threshold.
Template Name
OZONE_STORAGE_CONTAINER_MANAGER_healthy_thresholds
Default Value
critical:50.0, warning:75.0
Unit(s)
PERCENT