Health Tests

Activity Monitor - Unsupported Since 7.0.0 Health

This service-level health test checks for the presence of a running, healthy Activity Monitor - Unsupported Since 7.0.0. The test returns "Bad" health if the service is running and the Activity Monitor - Unsupported Since 7.0.0 is not running. In all other cases it returns the health of the Activity Monitor - Unsupported Since 7.0.0. A failure of this health test indicates a stopped or unhealthy Activity Monitor - Unsupported Since 7.0.0. Check the status of the Activity Monitor - Unsupported Since 7.0.0 for more information. This test can be enabled or disabled using the Activity Monitor - Unsupported Since 7.0.0 Role Health Test Activity Monitor - Unsupported Since 7.0.0 service-wide monitoring setting.

Short Name: Activity Monitor - Unsupported Since 7.0.0 Health

Activity Monitor - Unsupported Since 7.0.0 Role Health Test

Description
When computing the overall MGMT health, consider Activity Monitor - Unsupported Since 7.0.0's health
Template Name
mgmt_activitymonitor_health_enabled
Default Value
true
Unit(s)
no unit

Alert Publisher Health

This service-level health test checks for the presence of a running, healthy Alert Publisher. The test returns "Bad" health if the service is running and the Alert Publisher is not running. In all other cases it returns the health of the Alert Publisher. A failure of this health test indicates a stopped or unhealthy Alert Publisher. Check the status of the Alert Publisher for more information. This test can be enabled or disabled using the Alert Publisher Role Health Test Alert Publisher service-wide monitoring setting.

Short Name: Alert Publisher Health

Alert Publisher Role Health Test

Description
When computing the overall MGMT health, consider Alert Publisher's health
Template Name
mgmt_alertpublisher_health_enabled
Default Value
true
Unit(s)
no unit

Certificate Expiration

This health test checks the expiry time of the TLS certificate for the Server. This test can be configured using the TLS Certificate Expiry Thresholds Service monitoring setting. This test alerts when the certificate is close to expiry.

Short Name: Certificate Expiration

TLS Certificate Expiry Thresholds

Description
The health test thresholds for monitoring the certificate of Server.
Template Name
mgmt_certificate_expiry_thresholds
Default Value
critical:7.0, warning:60.0
Unit(s)
DAYS

Server Clock Offset

This is a health test that checks if the clock offset between Server and Service Monitor is too large. This test can be configured using the Server Clock Offset Thresholds Service monitoring setting. This test calculates clock offset using the timestamp int messages that Server sends to Service Monitor and the timestamp when Service Monitor gets the messages. Check the NTP configuration on both Server and Service Monitor machines if this test fails.

Short Name: Server Clock Offset

Server Clock Offset Thresholds

Description
The health test thresholds for monitoring the clock offset between the Server and the Service Monitor.
Template Name
mgmt_clock_offset_with_smon_thresholds
Default Value
critical:60000.0, warning:30000.0
Unit(s)
MILLISECONDS

Server Cluster Availability

This health test checks the Server clusters availability. This test alerts when Server cluster does not have enough healthy nodes.

Short Name: Server Cluster Availability

Server Cluster Availability Threshold Percents

Description
The health test thresholds for the Server Cluster Availability. Specify the minimum required percent of healthy and running CM cluster nodes.
Template Name
mgmt_cm_ha_availability_thresholds
Default Value
critical:never, warning:never
Unit(s)
PERCENT

Server Heap Size

This Server health test checks that the Server heap size is adequate for the it. Consider increasing the heap when it is in critical state A failure of this health test may indicate that the Server is not getting the optimal heap size. Inspect the Server logs for any pause monitor output, check garbage collection metrics exposed by the Server. This test can be configured using the settings.

Short Name: Server Heap Size

Server Heap Size Thresholds

Description
The health test thresholds for the Server heap usage.
Template Name
mgmt_heap_size_thresholds
Default Value
critical:95.0, warning:90.0
Unit(s)
PERCENT

Server Pause Duration

This health test checks that the Server threads are not experiencing long scheduling pauses. The test uses a pause monitoring thread in the Server that tracks scheduling delay by noting if it is run on its requested schedule. If the thread is not run on its requested schedule, the delay is noted and considered pause time. The health test checks that no more than some percentage of recent time is spent paused. A failure of this health test may indicate that the Server is not getting enough CPU resources, or that it is spending too much time doing garbage collection. Inspect the Server logs for any pause monitor output and check garbage collection metrics exposed by the Service Monitor. This test can be configured using the Server Pause Duration Thresholds and Server Pause Duration Monitoring Period Server settings.

Short Name: Server Pause Duration

Server Pause Duration Monitoring Period

Description
The period to review when computing the moving average of extra time the pause monitor spent paused.
Template Name
mgmt_pause_duration_window
Default Value
5
Unit(s)
MINUTES

Server Pause Duration Thresholds

Description
The health test thresholds for the Server pause duration time.
Template Name
mgmt_pause_duration_thresholds
Default Value
critical:5.0, warning:2.0
Unit(s)
PERCENT

Command Storage Directory Free Space

This Command Storage Directory Free Space Monitoring Thresholds health test checks that the filesystem containing the directory used to store the Server's command data has sufficient space. This test can be configured using the Command Storage Directory Free Space Monitoring Thresholds Management Services monitoring settings. This directory can be changed by setting the Server Local Data Storage Directory path in settings.

Short Name: Command Storage Directory Free Space

Server Local Data Storage Directory

Description
Local path used by for storing data, including command result files. Note that changes to this configuration will only apply to commands started after the change. It is highly recommended that existing data be migrated over to the new location for the data to be accessible via and managed by .
Template Name
command_storage_path
Default Value
/var/lib/cloudera-scm-server
Unit(s)
no unit

Command Storage Directory Free Space Monitoring Thresholds

Description
The health test thresholds for monitoring the free space on the filesystem that contains the Server command storage directory.
Template Name
mgmt_command_storage_directory_free_space_absolute_thresholds
Default Value
critical:1.073741824E9, warning:2.147483648E9
Unit(s)
BYTES

Embedded Database Free Space

The embedded PostgreSQL database running on the Server host is using a partition with a small amount of free space. Typically, the data for this embedded postgreSQL server is in /var/lib/cloudera-scm-server-db. Increase the available disk space or adjust the retention period of activity monitoring or auditing data. This test can be configured using the Embedded Database Free Space Monitoring Thresholds Management Services monitoring settings.

Short Name: Embedded Database Free Space

Embedded Database Free Space Monitoring Thresholds

Description
The health test thresholds for monitoring the free space on the volume for the embedded PostgreSQL database optionally running on the Server. If the embedded database is not in use, this has no effect.
Template Name
mgmt_embedded_database_free_space_absolute_thresholds
Default Value
critical:1.073741824E9, warning:2.147483648E9
Unit(s)
BYTES

Event Server Health

This service-level health test checks for the presence of a running, healthy Event Server. The test returns "Bad" health if the service is running and the Event Server is not running. In all other cases it returns the health of the Event Server. A failure of this health test indicates a stopped or unhealthy Event Server. Check the status of the Event Server for more information. This test can be enabled or disabled using the Event Server Role Health Test Event Server service-wide monitoring setting.

Short Name: Event Server Health

Event Server Role Health Test

Description
When computing the overall MGMT health, consider Event Server's health
Template Name
mgmt_eventserver_health_enabled
Default Value
true
Unit(s)
no unit

Host Monitor Health

This service-level health test checks for the presence of a running, healthy Host Monitor. The test returns "Bad" health if the service is running and the Host Monitor is not running. In all other cases it returns the health of the Host Monitor. A failure of this health test indicates a stopped or unhealthy Host Monitor. Check the status of the Host Monitor for more information. This test can be enabled or disabled using the Host Monitor Role Health Test Host Monitor service-wide monitoring setting.

Short Name: Host Monitor Health

Host Monitor Role Health Test

Description
When computing the overall MGMT health, consider Host Monitor's health
Template Name
mgmt_hostmonitor_health_enabled
Default Value
true
Unit(s)
no unit

KDC Server Connection Health

This health test checks the KDC Service Connection's Health. This test can be configured using the KDC Connection Health Check Enabled Service monitoring setting. This test alerts when connection to the KDC Server is slow or not available.

Short Name: KDC Server Connection Health

KDC Server Connection Health Thresholds

Description
The health test thresholds for monitoring the KDC Server connection health by login time.
Template Name
kdc_availability_thresholds
Default Value
critical:2000.0, warning:1500.0
Unit(s)
MILLISECONDS

KDC Connection Health Check Enabled

Description
Enable or disable KDC Server Connection Health Check execution. Restart Server to apply changes.
Template Name
kdc_monitoring_enabled
Default Value
true
Unit(s)
no unit

LDAP Server Connection Health

This health test checks the LDAP Service Connection's Health. This test can be configured using the LDAP Connection Health Check Enabled Service monitoring setting. This test alerts when connection to the LDAP Server is slow or not available.

Short Name: LDAP Server Connection Health

LDAP Monitoring Period

Description
The Period of the 's LDAP Monitoring functionality.
Template Name
ldap_monitoring_period
Default Value
60000
Unit(s)
no unit

LDAP Server Connection Health Thresholds

Description
The health test thresholds for monitoring the LDAP Server connection health by login time.
Template Name
ldap_availability_thresholds
Default Value
critical:2000.0, warning:1500.0
Unit(s)
MILLISECONDS

LDAP Connection Health Check Enabled

Description
Enable or disable LDAP Server Connection Health Check execution. Restart Server to apply changes.
Template Name
ldap_monitoring_enabled
Default Value
true
Unit(s)
no unit

Navigator Audit Server Health

This service-level health test checks for the presence of a running, healthy Navigator Audit Server. The test returns "Bad" health if the service is running and the Navigator Audit Server is not running. In all other cases it returns the health of the Navigator Audit Server. A failure of this health test indicates a stopped or unhealthy Navigator Audit Server. Check the status of the Navigator Audit Server for more information. This test can be enabled or disabled using the Navigator Audit Server Role Health Test Navigator Audit Server service-wide monitoring setting.

Short Name: Navigator Audit Server Health

Navigator Audit Server Role Health Test

Description
When computing the overall MGMT health, consider Navigator Audit Server's health
Template Name
mgmt_navigator_health_enabled
Default Value
true
Unit(s)
no unit

Navigator Metadata Server Health

This service-level health test checks for the presence of a running, healthy Navigator Metadata Server. The test returns "Bad" health if the service is running and the Navigator Metadata Server is not running. In all other cases it returns the health of the Navigator Metadata Server. A failure of this health test indicates a stopped or unhealthy Navigator Metadata Server. Check the status of the Navigator Metadata Server for more information. This test can be enabled or disabled using the Navigator Metadata Server Role Health Test Navigator Metadata Server service-wide monitoring setting.

Short Name: Navigator Metadata Server Health

Navigator Metadata Server Role Health Test

Description
When computing the overall MGMT health, consider Navigator Metadata Server's health
Template Name
mgmt_navigatormetaserver_health_enabled
Default Value
true
Unit(s)
no unit

Reports Manager Health

This service-level health test checks for the presence of a running, healthy Reports Manager. The test returns "Bad" health if the service is running and the Reports Manager is not running. In all other cases it returns the health of the Reports Manager. A failure of this health test indicates a stopped or unhealthy Reports Manager. Check the status of the Reports Manager for more information. This test can be enabled or disabled using the Reports Manager Role Health Test Reports Manager service-wide monitoring setting.

Short Name: Reports Manager Health

Reports Manager Role Health Test

Description
When computing the overall MGMT health, consider Reports Manager's health
Template Name
mgmt_reportsmanager_health_enabled
Default Value
true
Unit(s)
no unit

Service Monitor Health

This service-level health test checks for the presence of a running, healthy Service Monitor. The test returns "Bad" health if the service is running and the Service Monitor is not running. In all other cases it returns the health of the Service Monitor. A failure of this health test indicates a stopped or unhealthy Service Monitor. Check the status of the Service Monitor for more information. This test can be enabled or disabled using the Service Monitor Role Health Test Service Monitor service-wide monitoring setting.

Short Name: Service Monitor Health

Service Monitor Role Health Test

Description
When computing the overall MGMT health, consider Service Monitor's health
Template Name
mgmt_servicemonitor_health_enabled
Default Value
true
Unit(s)
no unit

Telemetry Publisher Health

This service-level health test checks for the presence of a running, healthy Telemetry Publisher. The test returns "Bad" health if the service is running and the Telemetry Publisher is not running. In all other cases it returns the health of the Telemetry Publisher. A failure of this health test indicates a stopped or unhealthy Telemetry Publisher. Check the status of the Telemetry Publisher for more information. This test can be enabled or disabled using the Telemetry Publisher Role Health Test Telemetry Publisher service-wide monitoring setting.

Short Name: Telemetry Publisher Health

Telemetry Publisher Role Health Test

Description
When computing the overall MGMT health, consider Telemetry Publisher's health
Template Name
mgmt_telemetrypublisher_health_enabled
Default Value
true
Unit(s)
no unit