Cloudera Data Science Workbench Health Tests
Cloudera Data Science Workbench Application Health
This Cloudera Data Science Workbench service-level health test checks for the presence of a running, healthy Application. The test returns "Bad" health if the service is running and the Application is not running. In all other cases it returns the health of the Application. A failure of this health test indicates a stopped or unhealthy Application. Check the status of the Application for more information. This test can be enabled or disabled using the Application Role Health Test Application service-wide monitoring setting.
Short Name: Application Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Application Role Health Test | When computing the overall CDSW health, consider Application's health | CDSW_CDSW_APPLICATION_health_enabled | true | no unit |
Cloudera Data Science Workbench CDSW Status
This health test ensures Cloudera Data Science Workbench is ready to serve requests. If unhealthy you should verify the service configurations and refer to Troubleshooting Cloudera Data Science Workbench.
Short Name: CDSW Status
Cloudera Data Science Workbench Docker Daemon Health
This is a Cloudera Data Science Workbench service-level health test that checks that enough of the Docker Daemons in the cluster are healthy. The test returns "Concerning" health if the number of healthy Docker Daemons falls below a warning threshold, expressed as a percentage of the total number of Docker Daemons. The test returns "Bad" health if the number of healthy and "Concerning" Docker Daemons falls below a critical threshold, expressed as a percentage of the total number of Docker Daemons. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Docker Daemons, this test would return "Good" health if 95 or more Docker Daemons have good health. This test would return "Concerning" health if at least 90 Docker Daemons have either "Good" or "Concerning" health. If more than 10 Docker Daemons have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Docker Daemons. Check the status of the individual Docker Daemons for more information. This test can be configured using the Cloudera Data Science Workbench Cloudera Data Science Workbench service-wide monitoring setting.
Short Name: Docker Daemon Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy Docker Daemon Monitoring Thresholds | The health test thresholds of the overall Docker Daemon health. The check returns "Concerning" health if the percentage of "Healthy" Docker Daemons falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Docker Daemons falls below the critical threshold. | CDSW_CDSW_DOCKER_healthy_thresholds | critical:70.0, warning:95.0 | PERCENT |
Cloudera Data Science Workbench Master Health
This Cloudera Data Science Workbench service-level health test checks for the presence of a running, healthy Master. The test returns "Bad" health if the service is running and the Master is not running. In all other cases it returns the health of the Master. A failure of this health test indicates a stopped or unhealthy Master. Check the status of the Master for more information. This test can be enabled or disabled using the Master Role Health Test Master service-wide monitoring setting.
Short Name: Master Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Master Role Health Test | When computing the overall CDSW health, consider Master's health | CDSW_CDSW_MASTER_health_enabled | true | no unit |
Cloudera Data Science Workbench Worker Health
This is a Cloudera Data Science Workbench service-level health test that checks that enough of the Workers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Workers falls below a warning threshold, expressed as a percentage of the total number of Workers. The test returns "Bad" health if the number of healthy and "Concerning" Workers falls below a critical threshold, expressed as a percentage of the total number of Workers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Workers, this test would return "Good" health if 95 or more Workers have good health. This test would return "Concerning" health if at least 90 Workers have either "Good" or "Concerning" health. If more than 10 Workers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Workers. Check the status of the individual Workers for more information. This test can be configured using the Cloudera Data Science Workbench Cloudera Data Science Workbench service-wide monitoring setting.
Short Name: Worker Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy Worker Monitoring Thresholds | The health test thresholds of the overall Worker health. The check returns "Concerning" health if the percentage of "Healthy" Workers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Workers falls below the critical threshold. | CDSW_CDSW_WORKER_healthy_thresholds | critical:70.0, warning:95.0 | PERCENT |