Host Metrics

Reference information for Host Metrics

In addition to these base metrics, many aggregate metrics are available. If an entity type has parents defined, you can formulate all possible aggregate metrics using the formula base_metric_across_parents.

In addition, metrics for aggregate totals can be formed by adding the prefix total_ to the front of the metric name.

Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".

For example, the following metric names may be valid for Host:

  • agent_cert_expiry_across_clusters
  • total_agent_cert_expiry_across_clusters

Some metrics, such as alerts_rate, apply to nearly every metric context. Others only apply to a certain service or role.

agent_cert_expiry

Description
Remaining days until the expiry of the certificate of Cloudera Manager Agent
Unit
seconds
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_cpu_system_rate

Description
Cloudera Manager Agent System CPU Time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_cpu_user_rate

Description
Cloudera Manager Agent User CPU Time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_fd_max

Description
Cloudera Manager Agent File Descriptor Max
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_fd_open

Description
Cloudera Manager Agent File Descriptors
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_hb_latency_millis

Description
Heartbeat latency observed by Cloudera Manager Agent communicating to Cloudera Manager Server
Unit
ms
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_physical_memory_used

Description
Agent physical memory used
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

agent_virtual_memory_used

Description
Agent virtual memory used
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

alerts_rate

Description
The number of alerts.
Unit
events per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

available_entropy

Description
The entropy that is available on the host
Unit
entropy
Parents
CDH Version
[CM -1.0.0..CM -1.0.0]

clock_offset

Description
Clock offset as reported by the host's NTP service from 'ntpdc -np' or 'chronyc sources'. If NTP is not in use, this metric is not collected.
Unit
ms
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cores

Description
Logical CPU Cores
Unit
cores
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_guest_nice_rate

Description
Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel). Requires Linux 2.6.33. CPU guest nice time is included in CPU nice time.
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_guest_rate

Description
Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel. Requires Linux 2.6.24. CPU guest time is included in CPU user time.
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_idle_rate

Description
Total CPU idle time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_iowait_rate

Description
Total CPU iowait time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_irq_rate

Description
Total CPU IRQ time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_nice_rate

Description
Total CPU nice time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_percent

Description
Total CPU usage of the host (averaged since last report)
Unit
percent
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_soft_irq_rate

Description
Total CPU soft IRQ time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_steal_rate

Description
Stolen time, which is the time spent in other operating systems when running in a virtualized environment. Requires Linux 2.6.11.
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_system_rate

Description
Total System CPU
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

cpu_user_rate

Description
Total CPU user time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

dns_name_resolution_duration

Description
The duration of a call to InetAddress.getLocalHost() in a helper java process run by the Cloudera Manager Agent.
Unit
ms
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

events_critical_rate

Description
The number of critical events.
Unit
events per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

events_important_rate

Description
The number of important events.
Unit
events per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

events_informational_rate

Description
The number of informational events.
Unit
events per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

fd_max

Description
Maximum number of file descriptors
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

fd_open

Description
Open file descriptors.
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

health_bad_rate

Description
Percentage of Time with Bad Health
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

health_concerning_rate

Description
Percentage of Time with Concerning Health
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

health_disabled_rate

Description
Percentage of Time with Disabled Health
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

health_good_rate

Description
Percentage of Time with Good Health
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

health_unknown_rate

Description
Percentage of Time with Unknown Health
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

hmon_message_bytes_sent_rate

Description
Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Host Monitor
Unit
bytes per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

hmon_message_transmit_duration

Description
The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Host Monitor
Unit
ms
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

hmon_message_transmit_failed_rate

Description
Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Host Monitor
Unit
messages per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

hmon_message_transmit_succeeded_rate

Description
Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Host Monitor
Unit
messages per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

load_1

Description
Load Average over 1 minute
Unit
load average
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

load_15

Description
Load Average over 15 minute
Unit
load average
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

load_5

Description
Load Average over 5 minutes
Unit
load average
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

overcommit_ratio

Description
Percentage of physical RAM that the committed address space cannot exceed. Retrieved from /proc/sys/vm/overcommit_ratio.
Unit
percent
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_buffers

Description
The amount of physical memory devoted to temporary storage for raw disk blocks. This is the 'Buffers' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_cached

Description
The amount of physical memory used for files read from the disk. This is commonly referred to as the pagecache. This is the 'Cached' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_commit_limit

Description
Total amount of memory currently available to be allocated on the system. This is the 'CommitLimit' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_dirty

Description
The total amount of memory waiting to be written back to the disk. This is the 'Dirty' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_dirty_ratio

Description
Maximum percentage of physical memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to perform more writes. This is read from /proc/sys/vm/dirty_ratio.
Unit
percent
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_mapped

Description
The total amount of memory which has been used to map devices, files, or libraries using the mmap command. This is the 'Mapped' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_memfree

Description
The amount of physical memory left unused by the system. This is the 'MemFree' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_total

Description
The total physical memory available.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_used

Description
The total amount of memory being used, excluding buffers and cache.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

physical_memory_writeback

Description
The total amount of memory actively being written back to the disk. This is the 'Writeback' field from /proc/meminfo.
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

smon_message_bytes_sent_rate

Description
Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Service Monitor
Unit
bytes per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

smon_message_transmit_duration

Description
The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Service Monitor
Unit
ms
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

smon_message_transmit_failed_rate

Description
Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Service Monitor
Unit
messages per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

smon_message_transmit_succeeded_rate

Description
Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Service Monitor
Unit
messages per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_cpu_system_rate

Description
Supervisord System CPU Time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_cpu_user_rate

Description
Supervisord User CPU Time
Unit
seconds per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_failures_rate

Description
The number of failures contacting supervisord seen by the Cloudera Manager Agent
Unit
failures per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_fd_max

Description
Supervisord File Descriptor Max
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_fd_open

Description
Supervisord File Descriptors
Unit
file descriptors
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_latency

Description
The average latency contacting supervisord seen by the Cloudera Manager Agent
Unit
seconds
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_physical_memory_used

Description
Supervisord physical memory used
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

supervisord_virtual_memory_used

Description
Supervisord virtual memory used
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

swap_free

Description
Swap free
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

swap_out_rate

Description
Memory swapped out to disk
Unit
pages per second
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

swap_total

Description
Swap capacity
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

swap_used

Description
Swap used
Unit
bytes
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_close

Description
The number of TCP connections in state CLOSE
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_close_wait

Description
The number of TCP connections in state CLOSE_WAIT
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_closing

Description
The number of TCP connections in state CLOSING
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_established

Description
The number of TCP connections in state ESTABLISHED
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_fin_wait1

Description
The number of TCP connections in state FIN_WAIT1
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_fin_wait2

Description
The number of TCP connections in state FIN_WAIT2
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_last_ack

Description
The number of TCP connections in state LAST_ACK
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_listen

Description
The number of TCP connections in state LISTEN
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_syn_recv

Description
The number of TCP connections in state SYN_RECV
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_syn_sent

Description
The number of TCP connections in state SYN_SENT
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

tcp_connection_count_time_wait

Description
The number of TCP connections in state TIME_WAIT
Unit
connections
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]

uptime

Description
For a host, the amount of time since the host was booted. For a role, the uptime of the backing process.
Unit
seconds
Parents
cluster, rack
CDH Version
[CM -1.0.0..CM -1.0.0]