Host Metrics

In addition to these base metrics, many aggregate metrics are available. If an entity type has parents defined, you can formulate all possible aggregate metrics using the formula base_metric_across_parents.

In addition, metrics for aggregate totals can be formed by adding the prefix total_ to the front of the metric name.

Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".

For example, the following metric names may be valid for Host:

  • agent_cpu_system_rate_across_clusters
  • total_agent_cpu_system_rate_across_clusters

Some metrics, such as alerts_rate, apply to nearly every metric context. Others only apply to a certain service or role.

For more information about metrics, see Cloudera Manager Metrics and Metric Aggregation.

Metric Name Description Unit Parents CDH Version
agent_cpu_system_rate Cloudera Manager Agent System CPU Time seconds per second cluster, rack n/a
agent_cpu_user_rate Cloudera Manager Agent User CPU Time seconds per second cluster, rack n/a
agent_fd_max Cloudera Manager Agent File Descriptor Max file descriptors cluster, rack n/a
agent_fd_open Cloudera Manager Agent File Descriptors file descriptors cluster, rack n/a
agent_hb_latency_millis Heartbeat latency observed by Cloudera Manager Agent communicating to Cloudera Manager Server ms cluster, rack n/a
agent_physical_memory_used Agent physical memory used bytes cluster, rack n/a
agent_virtual_memory_used Agent virtual memory used bytes cluster, rack n/a
alerts_rate The number of alerts. events per second cluster, rack n/a
clock_offset Clock offset as reported by the host's NTP service from 'ntpdc -np' or 'chronyc sources'. If NTP is not in use, this metric is not collected. ms cluster, rack n/a
cores Logical CPU Cores cores cluster, rack n/a
cpu_guest_nice_rate Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel). Requires Linux 2.6.33. CPU guest nice time is included in CPU nice time. seconds per second cluster, rack n/a
cpu_guest_rate Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel. Requires Linux 2.6.24. CPU guest time is included in CPU user time. seconds per second cluster, rack n/a
cpu_idle_rate Total CPU idle time seconds per second cluster, rack n/a
cpu_iowait_rate Total CPU iowait time seconds per second cluster, rack n/a
cpu_irq_rate Total CPU IRQ time seconds per second cluster, rack n/a
cpu_nice_rate Total CPU nice time seconds per second cluster, rack n/a
cpu_percent Total CPU usage of the host (averaged since last report) percent cluster, rack n/a
cpu_soft_irq_rate Total CPU soft IRQ time seconds per second cluster, rack n/a
cpu_steal_rate Stolen time, which is the time spent in other operating systems when running in a virtualized environment. Requires Linux 2.6.11. seconds per second cluster, rack n/a
cpu_system_rate Total System CPU seconds per second cluster, rack n/a
cpu_user_rate Total CPU user time seconds per second cluster, rack n/a
dns_name_resolution_duration The duration of a call to InetAddress.getLocalHost() in a helper java process run by the Cloudera Manager Agent. ms cluster, rack n/a
events_critical_rate The number of critical events. events per second cluster, rack n/a
events_important_rate The number of important events. events per second cluster, rack n/a
events_informational_rate The number of informational events. events per second cluster, rack n/a
fd_max Maximum number of file descriptors file descriptors cluster, rack n/a
fd_open Open file descriptors. file descriptors cluster, rack n/a
health_bad_rate Percentage of Time with Bad Health seconds per second cluster, rack n/a
health_concerning_rate Percentage of Time with Concerning Health seconds per second cluster, rack n/a
health_disabled_rate Percentage of Time with Disabled Health seconds per second cluster, rack n/a
health_good_rate Percentage of Time with Good Health seconds per second cluster, rack n/a
health_unknown_rate Percentage of Time with Unknown Health seconds per second cluster, rack n/a
hmon_message_bytes_sent_rate Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Host Monitor bytes per second cluster, rack n/a
hmon_message_transmit_duration The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Host Monitor ms cluster, rack n/a
hmon_message_transmit_failed_rate Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Host Monitor messages per second cluster, rack n/a
hmon_message_transmit_succeeded_rate Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Host Monitor messages per second cluster, rack n/a
load_1 Load Average over 1 minute load average cluster, rack n/a
load_15 Load Average over 15 minute load average cluster, rack n/a
load_5 Load Average over 5 minutes load average cluster, rack n/a
overcommit_ratio Percentage of physical RAM that the committed address space cannot exceed. Retrieved from /proc/sys/vm/overcommit_ratio. percent cluster, rack n/a
physical_memory_buffers The amount of physical memory devoted to temporary storage for raw disk blocks. This is the 'Buffers' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_cached The amount of physical memory used for files read from the disk. This is commonly referred to as the pagecache. This is the 'Cached' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_commit_limit Total amount of memory currently available to be allocated on the system. This is the 'CommitLimit' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_dirty The total amount of memory waiting to be written back to the disk. This is the 'Dirty' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_dirty_ratio Maximum percentage of physical memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to perform more writes. This is read from /proc/sys/vm/dirty_ratio. percent cluster, rack n/a
physical_memory_mapped The total amount of memory which has been used to map devices, files, or libraries using the mmap command. This is the 'Mapped' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_memfree The amount of physical memory left unused by the system. This is the 'MemFree' field from /proc/meminfo. bytes cluster, rack n/a
physical_memory_total The total physical memory available. bytes cluster, rack n/a
physical_memory_used The total amount of memory being used, excluding buffers and cache. bytes cluster, rack n/a
physical_memory_writeback The total amount of memory actively being written back to the disk. This is the 'Writeback' field from /proc/meminfo. bytes cluster, rack n/a
smon_message_bytes_sent_rate Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Service Monitor bytes per second cluster, rack n/a
smon_message_transmit_duration The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Service Monitor ms cluster, rack n/a
smon_message_transmit_failed_rate Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Service Monitor messages per second cluster, rack n/a
smon_message_transmit_succeeded_rate Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Service Monitor messages per second cluster, rack n/a
supervisord_cpu_system_rate Supervisord System CPU Time seconds per second cluster, rack n/a
supervisord_cpu_user_rate Supervisord User CPU Time seconds per second cluster, rack n/a
supervisord_failures_rate The number of failures contacting supervisord seen by the Cloudera Manager Agent failures per second cluster, rack n/a
supervisord_fd_max Supervisord File Descriptor Max file descriptors cluster, rack n/a
supervisord_fd_open Supervisord File Descriptors file descriptors cluster, rack n/a
supervisord_latency The average latency contacting supervisord seen by the Cloudera Manager Agent seconds cluster, rack n/a
supervisord_physical_memory_used Supervisord physical memory used bytes cluster, rack n/a
supervisord_virtual_memory_used Supervisord virtual memory used bytes cluster, rack n/a
swap_free Swap free bytes cluster, rack n/a
swap_out_rate Memory swapped out to disk pages per second cluster, rack n/a
swap_total Swap capacity bytes cluster, rack n/a
swap_used Swap used bytes cluster, rack n/a
tcp_connection_count_close The number of TCP connections in state CLOSE connections cluster, rack n/a
tcp_connection_count_close_wait The number of TCP connections in state CLOSE_WAIT connections cluster, rack n/a
tcp_connection_count_closing The number of TCP connections in state CLOSING connections cluster, rack n/a
tcp_connection_count_established The number of TCP connections in state ESTABLISHED connections cluster, rack n/a
tcp_connection_count_fin_wait1 The number of TCP connections in state FIN_WAIT1 connections cluster, rack n/a
tcp_connection_count_fin_wait2 The number of TCP connections in state FIN_WAIT2 connections cluster, rack n/a
tcp_connection_count_last_ack The number of TCP connections in state LAST_ACK connections cluster, rack n/a
tcp_connection_count_listen The number of TCP connections in state LISTEN connections cluster, rack n/a
tcp_connection_count_syn_recv The number of TCP connections in state SYN_RECV connections cluster, rack n/a
tcp_connection_count_syn_sent The number of TCP connections in state SYN_SENT connections cluster, rack n/a
tcp_connection_count_time_wait The number of TCP connections in state TIME_WAIT connections cluster, rack n/a
uptime For a host, the amount of time since the host was booted. For a role, the uptime of the backing process. seconds cluster, rack n/a