Host Metrics
In addition to these base metrics, many aggregate metrics are available.
If an entity type has parents defined, you can formulate all possible
aggregate metrics using the formula
base_metric_across_parents
.
In addition, metrics for aggregate totals can be formed by adding the prefix
total_
to the front of the metric name.
Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".
For example, the following metric names may be valid for Host:
-
agent_cert_expiry_across_clusters
-
total_agent_cert_expiry_across_clusters
Some metrics, such as alerts_rate
, apply to nearly every metric context. Others only apply to a
certain service or role.
Metric Name | Description | Unit | Parents | Version |
---|---|---|---|---|
agent_cert_expiry | Remaining days until the expiry of the certificate of Cloudera Manager Agent | seconds | cluster, rack | n/a |
agent_cpu_system_rate | Cloudera Manager Agent System CPU Time | seconds per second | cluster, rack | n/a |
agent_cpu_user_rate | Cloudera Manager Agent User CPU Time | seconds per second | cluster, rack | n/a |
agent_fd_max | Cloudera Manager Agent File Descriptor Max | file descriptors | cluster, rack | n/a |
agent_fd_open | Cloudera Manager Agent File Descriptors | file descriptors | cluster, rack | n/a |
agent_hb_latency_millis | Heartbeat latency observed by Cloudera Manager Agent communicating to Cloudera Manager Server | ms | cluster, rack | n/a |
agent_physical_memory_used | Agent physical memory used | bytes | cluster, rack | n/a |
agent_virtual_memory_used | Agent virtual memory used | bytes | cluster, rack | n/a |
alerts_rate | The number of alerts. | events per second | cluster, rack | n/a |
available_entropy | The entropy that is available on the host | entropy | n/a | |
clock_offset | Clock offset as reported by the host's NTP service from 'ntpdc -np' or 'chronyc sources'. If NTP is not in use, this metric is not collected. | ms | cluster, rack | n/a |
cores | Logical CPU Cores | cores | cluster, rack | n/a |
cpu_guest_nice_rate | Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel). Requires Linux 2.6.33. CPU guest nice time is included in CPU nice time. | seconds per second | cluster, rack | n/a |
cpu_guest_rate | Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel. Requires Linux 2.6.24. CPU guest time is included in CPU user time. | seconds per second | cluster, rack | n/a |
cpu_idle_rate | Total CPU idle time | seconds per second | cluster, rack | n/a |
cpu_iowait_rate | Total CPU iowait time | seconds per second | cluster, rack | n/a |
cpu_irq_rate | Total CPU IRQ time | seconds per second | cluster, rack | n/a |
cpu_nice_rate | Total CPU nice time | seconds per second | cluster, rack | n/a |
cpu_percent | Total CPU usage of the host (averaged since last report) | percent | cluster, rack | n/a |
cpu_soft_irq_rate | Total CPU soft IRQ time | seconds per second | cluster, rack | n/a |
cpu_steal_rate | Stolen time, which is the time spent in other operating systems when running in a virtualized environment. Requires Linux 2.6.11. | seconds per second | cluster, rack | n/a |
cpu_system_rate | Total System CPU | seconds per second | cluster, rack | n/a |
cpu_user_rate | Total CPU user time | seconds per second | cluster, rack | n/a |
dns_name_resolution_duration | The duration of a call to InetAddress.getLocalHost() in a helper java process run by the Cloudera Manager Agent. | ms | cluster, rack | n/a |
events_critical_rate | The number of critical events. | events per second | cluster, rack | n/a |
events_important_rate | The number of important events. | events per second | cluster, rack | n/a |
events_informational_rate | The number of informational events. | events per second | cluster, rack | n/a |
fd_max | Maximum number of file descriptors | file descriptors | cluster, rack | n/a |
fd_open | Open file descriptors. | file descriptors | cluster, rack | n/a |
health_bad_rate | Percentage of Time with Bad Health | seconds per second | cluster, rack | n/a |
health_concerning_rate | Percentage of Time with Concerning Health | seconds per second | cluster, rack | n/a |
health_disabled_rate | Percentage of Time with Disabled Health | seconds per second | cluster, rack | n/a |
health_good_rate | Percentage of Time with Good Health | seconds per second | cluster, rack | n/a |
health_unknown_rate | Percentage of Time with Unknown Health | seconds per second | cluster, rack | n/a |
hmon_message_bytes_sent_rate | Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Host Monitor | bytes per second | cluster, rack | n/a |
hmon_message_transmit_duration | The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Host Monitor | ms | cluster, rack | n/a |
hmon_message_transmit_failed_rate | Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Host Monitor | messages per second | cluster, rack | n/a |
hmon_message_transmit_succeeded_rate | Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Host Monitor | messages per second | cluster, rack | n/a |
load_1 | Load Average over 1 minute | load average | cluster, rack | n/a |
load_15 | Load Average over 15 minute | load average | cluster, rack | n/a |
load_5 | Load Average over 5 minutes | load average | cluster, rack | n/a |
overcommit_ratio | Percentage of physical RAM that the committed address space cannot exceed. Retrieved from /proc/sys/vm/overcommit_ratio. | percent | cluster, rack | n/a |
physical_memory_buffers | The amount of physical memory devoted to temporary storage for raw disk blocks. This is the 'Buffers' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_cached | The amount of physical memory used for files read from the disk. This is commonly referred to as the pagecache. This is the 'Cached' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_commit_limit | Total amount of memory currently available to be allocated on the system. This is the 'CommitLimit' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_dirty | The total amount of memory waiting to be written back to the disk. This is the 'Dirty' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_dirty_ratio | Maximum percentage of physical memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to perform more writes. This is read from /proc/sys/vm/dirty_ratio. | percent | cluster, rack | n/a |
physical_memory_mapped | The total amount of memory which has been used to map devices, files, or libraries using the mmap command. This is the 'Mapped' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_memfree | The amount of physical memory left unused by the system. This is the 'MemFree' field from /proc/meminfo. | bytes | cluster, rack | n/a |
physical_memory_total | The total physical memory available. | bytes | cluster, rack | n/a |
physical_memory_used | The total amount of memory being used, excluding buffers and cache. | bytes | cluster, rack | n/a |
physical_memory_writeback | The total amount of memory actively being written back to the disk. This is the 'Writeback' field from /proc/meminfo. | bytes | cluster, rack | n/a |
smon_message_bytes_sent_rate | Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Service Monitor | bytes per second | cluster, rack | n/a |
smon_message_transmit_duration | The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Service Monitor | ms | cluster, rack | n/a |
smon_message_transmit_failed_rate | Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Service Monitor | messages per second | cluster, rack | n/a |
smon_message_transmit_succeeded_rate | Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Service Monitor | messages per second | cluster, rack | n/a |
supervisord_cpu_system_rate | Supervisord System CPU Time | seconds per second | cluster, rack | n/a |
supervisord_cpu_user_rate | Supervisord User CPU Time | seconds per second | cluster, rack | n/a |
supervisord_failures_rate | The number of failures contacting supervisord seen by the Cloudera Manager Agent | failures per second | cluster, rack | n/a |
supervisord_fd_max | Supervisord File Descriptor Max | file descriptors | cluster, rack | n/a |
supervisord_fd_open | Supervisord File Descriptors | file descriptors | cluster, rack | n/a |
supervisord_latency | The average latency contacting supervisord seen by the Cloudera Manager Agent | seconds | cluster, rack | n/a |
supervisord_physical_memory_used | Supervisord physical memory used | bytes | cluster, rack | n/a |
supervisord_virtual_memory_used | Supervisord virtual memory used | bytes | cluster, rack | n/a |
swap_free | Swap free | bytes | cluster, rack | n/a |
swap_out_rate | Memory swapped out to disk | pages per second | cluster, rack | n/a |
swap_total | Swap capacity | bytes | cluster, rack | n/a |
swap_used | Swap used | bytes | cluster, rack | n/a |
tcp_connection_count_close | The number of TCP connections in state CLOSE | connections | cluster, rack | n/a |
tcp_connection_count_close_wait | The number of TCP connections in state CLOSE_WAIT | connections | cluster, rack | n/a |
tcp_connection_count_closing | The number of TCP connections in state CLOSING | connections | cluster, rack | n/a |
tcp_connection_count_established | The number of TCP connections in state ESTABLISHED | connections | cluster, rack | n/a |
tcp_connection_count_fin_wait1 | The number of TCP connections in state FIN_WAIT1 | connections | cluster, rack | n/a |
tcp_connection_count_fin_wait2 | The number of TCP connections in state FIN_WAIT2 | connections | cluster, rack | n/a |
tcp_connection_count_last_ack | The number of TCP connections in state LAST_ACK | connections | cluster, rack | n/a |
tcp_connection_count_listen | The number of TCP connections in state LISTEN | connections | cluster, rack | n/a |
tcp_connection_count_syn_recv | The number of TCP connections in state SYN_RECV | connections | cluster, rack | n/a |
tcp_connection_count_syn_sent | The number of TCP connections in state SYN_SENT | connections | cluster, rack | n/a |
tcp_connection_count_time_wait | The number of TCP connections in state TIME_WAIT | connections | cluster, rack | n/a |
uptime | For a host, the amount of time since the host was booted. For a role, the uptime of the backing process. | seconds | cluster, rack | n/a |