Streams Replication Manager Health Tests

Streams Replication Manager SRM Driver Health

This is a Streams Replication Manager service-level health test that checks that enough of the SRM Drivers in the cluster are healthy. The test returns "Concerning" health if the number of healthy SRM Drivers falls below a warning threshold, expressed as a percentage of the total number of SRM Drivers. The test returns "Bad" health if the number of healthy and "Concerning" SRM Drivers falls below a critical threshold, expressed as a percentage of the total number of SRM Drivers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 SRM Drivers, this test would return "Good" health if 95 or more SRM Drivers have good health. This test would return "Concerning" health if at least 90 SRM Drivers have either "Good" or "Concerning" health. If more than 10 SRM Drivers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy SRM Drivers. Check the status of the individual SRM Drivers for more information. This test can be configured using the Streams Replication Manager Streams Replication Manager service-wide monitoring setting.

Short Name: SRM Driver Health

Healthy SRM Driver Monitoring Thresholds

Description
The health test thresholds of the overall SRM Driver health. The check returns "Concerning" health if the percentage of "Healthy" SRM Drivers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" SRM Drivers falls below the critical threshold.
Template Name
STREAMS_REPLICATION_MANAGER_STREAMS_REPLICATION_MANAGER_DRIVER_healthy_thresholds
Default Value
critical:49.99, warning:94.99
Unit(s)
PERCENT

Streams Replication Manager SRM Service Health

This is a Streams Replication Manager service-level health test that checks that enough of the SRM Services in the cluster are healthy. The test returns "Concerning" health if the number of healthy SRM Services falls below a warning threshold, expressed as a percentage of the total number of SRM Services. The test returns "Bad" health if the number of healthy and "Concerning" SRM Services falls below a critical threshold, expressed as a percentage of the total number of SRM Services. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 SRM Services, this test would return "Good" health if 95 or more SRM Services have good health. This test would return "Concerning" health if at least 90 SRM Services have either "Good" or "Concerning" health. If more than 10 SRM Services have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy SRM Services. Check the status of the individual SRM Services for more information. This test can be configured using the Streams Replication Manager Streams Replication Manager service-wide monitoring setting.

Short Name: SRM Service Health

Healthy SRM Service Monitoring Thresholds

Description
The health test thresholds of the overall SRM Service health. The check returns "Concerning" health if the percentage of "Healthy" SRM Services falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" SRM Services falls below the critical threshold.
Template Name
STREAMS_REPLICATION_MANAGER_STREAMS_REPLICATION_MANAGER_SERVICE_healthy_thresholds
Default Value
critical:49.99, warning:94.99
Unit(s)
PERCENT