Host Health Tests
Host Agent Log Directory
This is a host health test that checks that the filesystem containing the Cloudera Manager Agent's log directory has sufficient free space. This test can be configured using the Cloudera Manager Agent Log Directory Free Space Monitoring Absolute Thresholds and Cloudera Manager Agent Log Directory Free Space Monitoring Percentage Thresholds host monitoring settings.
Short Name: Agent Log Directory
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Cloudera Manager Agent Log Directory Free Space Monitoring Absolute Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's log directory. | host_agent_log_directory_free_space_absolute_thresholds | critical:1.073741824E9, warning:2.147483648E9 | BYTES |
Cloudera Manager Agent Log Directory Free Space Monitoring Percentage Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Cloudera Manager Agent Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | host_agent_log_directory_free_space_percentage_thresholds | critical:never, warning:never | PERCENT |
Host Agent Parcel Directory
This is a host health test that checks whether the filesystem containing the Cloudera Manager Agent's parcel directory has sufficient free space. This test can be configured using the Cloudera Manager Agent Parcel Directory Free Space Monitoring Absolute Thresholds and Cloudera Manager Agent Parcel Directory Free Space Monitoring Percentage Thresholds host monitoring settings.
Short Name: Agent Parcel Directory
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Cloudera Manager Agent Parcel Directory Free Space Monitoring Absolute Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's parcel directory. | host_agent_parcel_directory_free_space_absolute_thresholds | critical:5.36870912E9, warning:1.073741824E10 | BYTES |
Cloudera Manager Agent Parcel Directory Free Space Monitoring Percentage Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's parcel directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Cloudera Manager Agent Parcel Directory Free Space Monitoring Absolute Thresholds setting is configured. | host_agent_parcel_directory_free_space_percentage_thresholds | critical:never, warning:never | PERCENT |
Host Agent Process Directory
This is a host health test that checks that the filesystem containing the Cloudera Manager Agent's process directory has sufficient free space. The process directory contains the configuration files for the processes which the Cloudera Manager Agent starts. This test can be configured using the Cloudera Manager Agent Process Directory Free Space Monitoring Absolute Thresholds and Cloudera Manager Agent Process Directory Free Space Monitoring Percentage Thresholds host monitoring settings.
Short Name: Agent Process Directory
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Cloudera Manager Agent Process Directory Free Space Monitoring Absolute Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's process directory. | host_agent_process_directory_free_space_absolute_thresholds | critical:1.048576E8, warning:2.097152E8 | BYTES |
Cloudera Manager Agent Process Directory Free Space Monitoring Percentage Thresholds | The health check thresholds for monitoring of free space on the filesystem that contains the Cloudera Manager Agent's process directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Cloudera Manager Agent Process Directory Free Space Monitoring Absolute Thresholds setting is configured. | host_agent_process_directory_free_space_percentage_thresholds | critical:never, warning:never | PERCENT |
Host Agent Status
This is a host health test that checks that the host's Cloudera Manager Agent is heart beating correctly and has the correct software version. A failure of this health test may indicate a lack of connectivity with the host's Cloudera Manager Agent, a problem with the Cloudera Manager Agent, or that the Cloudera Manager Agent or Host Monitor software is out of date. Check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the host, or look in the host's Cloudera Manager Agent logs for more details. If this test reports a software version mismatch between the Cloudera Manager Agent and the Host Monitor, check the version of each component by consulting the appropriate logs or the appropriate status web pages. This test can be enabled or disabled using the Host Process Health Test host configuration setting.
Short Name: Agent Status
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host Process Health Test | Enables the health test that the host's process state is consistent with the role configuration | host_scm_health_enabled | true | no unit |
Host Clock Offset
This is a host health test that checks if the host's system clock appears to be out-of-sync with its NTP server. The test checks that the absolute value of the host's clock offset as reported by 'ntpdc -c loopinfo' command is not too large. If the command fails, or the host's NTP daemon is not running the test will return "Bad" health. If NTP is not in use on the host, this check should be disabled for the host using the configuration options mentioned below. Cloudera recommends using NTP for time synchronization of Hadoop clusters. A failure of this health test may indicate a problem with the host's NTP service or configuration. This test can be configured using the Host Clock Offset Thresholds host configuration setting.
Short Name: Clock Offset
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host Clock Offset Thresholds | The thresholds for the host clock offset health test. The test will compare this threshold against the absolute value of the clock offset reported by the host's NTP service from the 'ntpdc -c loopinfo' command. Setting this to disabled will turn off collection of the clock offset by the Cloudera Manager Agent. If NTP is not in use, this should be set to disabled. Cloudera recommends using NTP for time synchronization of Hadoop clusters. | host_clock_offset_thresholds | critical:10000.0, warning:3000.0 | MILLISECONDS |
Host DNS Resolution
This is a host health test that checks that the host's hostname and canonical name are consistent when checked from a Java process. A failure of this health test may indicate that the host's DNS configuration is not correct. Check the Cloudera Manager Agent log for the names that were detected by this test. The hostname and canonical name are considered to be consistent if the hostname or the hostname plus a domain name is the same as the canonical name. This health test uses domain names from the domain and search lines in /etc/resolv.conf. This health test does not consult /etc/nsswitch.conf and may give incorrect results if /etc/resolv.conf is not used by the host. There may be a delay of up to 5 minutes before this health test picks up changes to /etc/resolv.conf. This test can be configured using the Hostname and Canonical Name Health Check host configuration setting.
Short Name: DNS Resolution
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Hostname and Canonical Name Health Check | Whether the hostname and canonical names for this host are consistent when checked from a Java process. | host_dns_resolution_enabled | true | no unit |
Host DNS Resolution Duration
This is a host health test that checks that the host's DNS resolution completes in a timely manner. The DNS resolution duration is calculated by measuring the time that a call to getLocalHost in a Java process takes on this host. Please note that DNS information may be cached on the host and this caching may affect the reported resolution duration. A failure of this health test may indicate that the host's DNS configuration is set incorrectly or the hosts's DNS server is responding slowly. This test can be configured using the Host DNS Resolution Duration Thresholds host configuration setting.
Short Name: DNS Resolution Duration
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host DNS Resolution Duration Thresholds | The health check thresholds for the host DNS resolution duration. | host_dns_resolution_duration_thresholds | critical:never, warning:1000.0 | MILLISECONDS |
Host Frame Errors
This is a host health test that checks for network frame errors across all network interfaces. A failure of this health test may indicate a problem with network hardware (e.g. switches) and can potentially cause other service or role-level performance problems. Check the host and network hardware logs for more details. This test can be configured using the Host Network Frame Error Percentage Thresholds, Host Network Frame Error Check Window, Host Network Frame Error Test Minimum Required Packets host configuration settings.
Short Name: Frame Errors
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host Network Frame Error Check Window | The amount of time over which the host frame error checks for frame errors. | host_network_frame_errors_window | 15 | MINUTES |
Host Network Frame Error Percentage Thresholds | The health check thresholds for the percentage of received packets that are frame errors. | host_network_frame_errors_thresholds | critical:0.5, warning:any | PERCENT |
Host Network Frame Error Test Minimum Required Packets | The minimum number of received packets that must be received within the test window for this test to return "Bad" health. If less that this number of packets is received during the test window, the health check will never return "Bad" health. | host_network_frame_errors_floor | 0 | no unit |
Host Network Interface Speed
This is a host health test that checks for network interfaces that appear to be operating at less than full speed. A failure of this health test may indicate that network interface(s) may be configured incorrectly and may be causing performance problems. Use the ethtool command to check and configure the host's network interfaces to use the fastest available link speed and duplex mode. This test can be configured using the Host's Network Interfaces Slow Link Modes Thresholds, Network Interface Expected Link Speed and Network Interface Expected Duplex Mode host configuration settings.
Short Name: Network Interface Speed
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host's Network Interfaces Slow Link Modes Thresholds | The thresholds for the health check of the number of network interfaces that appear to be operating at less than full speed. | host_network_interfaces_slow_mode_thresholds | critical:never, warning:any | no unit |
Network Interface Expected Duplex Mode | The expected duplex mode for network interfaces. | host_nic_expected_duplex_mode | Full | no unit |
Network Interface Expected Link Speed | The expected network interface link speed. | host_nic_expected_speed | 1000 | no unit |
Host Swapping
This is a health test that checks that the host has not swapped out more than a certain number of pages over the last fifteen minutes. A failure of this health test may indicate misconfiguration of the host operating system, or too many processes running on the host. Try reducing vm.swappiness, or add more memory to the host. This test can be configured using the Host Memory Swapping Thresholds, Host Memory Swapping Check Window host configuration settings.
Short Name: Swapping
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Host Memory Swapping Check Window | The amount of time over which the memory swapping test checks for pages swapped. | host_memswap_window | 15 | MINUTES |
Host Memory Swapping Thresholds | The health check thresholds of the number of pages swapped out on the host in the last 15 minutes | host_memswap_thresholds | critical:never, warning:any | PAGES |