This is a YARN service-level health test that checks for and active, healthy ResourceManager. The test returns "Bad" health if the service is running and an active ResourceManager cannot be found. If an active ResourceManager is found, then the test checks the health of that ResourceManager as well as the health of any standby ResourceManager configured. A "Good" health result will only be returned if both the active and Standby ResourceManagers are healthy. A failure of this health test may indicate stopped or unhealthy ResourceManager roles, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the ResourceManagers. Check the status of the YARN service's ResourceManager roles and look in the Cloudera Manager Service Monitor's log files for more information when this test fails. This test can be enabled or disabled using the Active ResourceManager Role Health Check YARN service-wide monitoring setting. The check for a healthy standby ResourceManager can be enabled or disabled with Standby ResourceManager Health Check. In addition, the Active ResourceManager Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active ResourceManager before this health test fails, and the ResourceManager Activation Startup Tolerance can be used to adjust the amount of time around ResourceManager startup that the test allows for a ResourceManager to be made active.
Short Name: ResourceManager Health
Active ResourceManager Detection Window🔗
- Description
- The tolerance window used in YARN service tests that depend on detection of the active ResourceManager.
- Template Name
-
yarn_active_resourcemanager_detection_window
- Default Value
- CDH=[[CDH 5.0.0..CDH 8.0.0)=3]
- Unit(s)
- MINUTES
Active ResourceManager Role Health Check🔗
- Description
- When computing the overall YARN service health, whether to consider the active ResourceManager's health.
- Template Name
-
yarn_resourcemanagers_health_enabled
- Default Value
- CDH=[[CDH 5.0.0..CDH 8.0.0)=true]
- Unit(s)
- no unit
ResourceManager Activation Startup Tolerance🔗
- Description
- The amount of time after ResourceManager(s) start that the lack of an active ResourceManager will be tolerated. This is an advanced option that does not often need to be changed.
- Template Name
-
yarn_resourcemanager_activation_startup_tolerance
- Default Value
- CDH=[[CDH 5.0.0..CDH 8.0.0)=180]
- Unit(s)
- SECONDS
Standby ResourceManager Health Check🔗
- Description
- When computing the overall YARN service health, whether to consider the health of the standby ResourceManager.
- Template Name
-
yarn_standby_resourcemanager_health_enabled
- Default Value
- CDH=[[CDH 5.0.0..CDH 8.0.0)=true]
- Unit(s)
- no unit