Host Metrics

In addition to these base metrics, many aggregate metrics are available. If an entity type has parents defined, you can formulate all possible aggregate metrics using the formula base_metric_across_parents.

In addition, metrics for aggregate totals can be formed by adding the prefix total_ to the front of the metric name.

Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".

For example, the following metric names may be valid for Host:

agent_cpu_system_rate_across_clusters
total_agent_cpu_system_rate_across_clusters

Some metrics, such as alerts_rate, apply to nearly every metric context. Others only apply to a certain service or role.

For more information about metrics, see Cloudera Manager Metrics and Metric Aggregation.

Metric Name	Description	Unit	Parents	CDH Version
agent_cpu_system_rate	Cloudera Manager Agent System CPU Time	seconds per second	cluster, rack	n/a
agent_cpu_user_rate	Cloudera Manager Agent User CPU Time	seconds per second	cluster, rack	n/a
agent_fd_max	Cloudera Manager Agent File Descriptor Max	file descriptors	cluster, rack	n/a
agent_fd_open	Cloudera Manager Agent File Descriptors	file descriptors	cluster, rack	n/a
agent_hb_latency_millis	Heartbeat latency observed by Cloudera Manager Agent communicating to Cloudera Manager Server	ms	cluster, rack	n/a
agent_physical_memory_used	Agent physical memory used	bytes	cluster, rack	n/a
agent_virtual_memory_used	Agent virtual memory used	bytes	cluster, rack	n/a
alerts_rate	The number of alerts.	events per second	cluster, rack	n/a
clock_offset	Clock offset as reported by the host's NTP service from 'ntpdc -np' or 'chronyc sources'. If NTP is not in use, this metric is not collected.	ms	cluster, rack	n/a
cores	Logical CPU Cores	cores	cluster, rack	n/a
cpu_guest_nice_rate	Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel). Requires Linux 2.6.33. CPU guest nice time is included in CPU nice time.	seconds per second	cluster, rack	n/a
cpu_guest_rate	Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel. Requires Linux 2.6.24. CPU guest time is included in CPU user time.	seconds per second	cluster, rack	n/a
cpu_idle_rate	Total CPU idle time	seconds per second	cluster, rack	n/a
cpu_iowait_rate	Total CPU iowait time	seconds per second	cluster, rack	n/a
cpu_irq_rate	Total CPU IRQ time	seconds per second	cluster, rack	n/a
cpu_nice_rate	Total CPU nice time	seconds per second	cluster, rack	n/a
cpu_percent	Total CPU usage of the host (averaged since last report)	percent	cluster, rack	n/a
cpu_soft_irq_rate	Total CPU soft IRQ time	seconds per second	cluster, rack	n/a
cpu_steal_rate	Stolen time, which is the time spent in other operating systems when running in a virtualized environment. Requires Linux 2.6.11.	seconds per second	cluster, rack	n/a
cpu_system_rate	Total System CPU	seconds per second	cluster, rack	n/a
cpu_user_rate	Total CPU user time	seconds per second	cluster, rack	n/a
dns_name_resolution_duration	The duration of a call to InetAddress.getLocalHost() in a helper java process run by the Cloudera Manager Agent.	ms	cluster, rack	n/a
events_critical_rate	The number of critical events.	events per second	cluster, rack	n/a
events_important_rate	The number of important events.	events per second	cluster, rack	n/a
events_informational_rate	The number of informational events.	events per second	cluster, rack	n/a
fd_max	Maximum number of file descriptors	file descriptors	cluster, rack	n/a
fd_open	Open file descriptors.	file descriptors	cluster, rack	n/a
health_bad_rate	Percentage of Time with Bad Health	seconds per second	cluster, rack	n/a
health_concerning_rate	Percentage of Time with Concerning Health	seconds per second	cluster, rack	n/a
health_disabled_rate	Percentage of Time with Disabled Health	seconds per second	cluster, rack	n/a
health_good_rate	Percentage of Time with Good Health	seconds per second	cluster, rack	n/a
health_unknown_rate	Percentage of Time with Unknown Health	seconds per second	cluster, rack	n/a
hmon_message_bytes_sent_rate	Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Host Monitor	bytes per second	cluster, rack	n/a
hmon_message_transmit_duration	The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Host Monitor	ms	cluster, rack	n/a
hmon_message_transmit_failed_rate	Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Host Monitor	messages per second	cluster, rack	n/a
hmon_message_transmit_succeeded_rate	Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Host Monitor	messages per second	cluster, rack	n/a
load_1	Load Average over 1 minute	load average	cluster, rack	n/a
load_15	Load Average over 15 minute	load average	cluster, rack	n/a
load_5	Load Average over 5 minutes	load average	cluster, rack	n/a
overcommit_ratio	Percentage of physical RAM that the committed address space cannot exceed. Retrieved from /proc/sys/vm/overcommit_ratio.	percent	cluster, rack	n/a
physical_memory_buffers	The amount of physical memory devoted to temporary storage for raw disk blocks. This is the 'Buffers' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_cached	The amount of physical memory used for files read from the disk. This is commonly referred to as the pagecache. This is the 'Cached' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_commit_limit	Total amount of memory currently available to be allocated on the system. This is the 'CommitLimit' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_dirty	The total amount of memory waiting to be written back to the disk. This is the 'Dirty' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_dirty_ratio	Maximum percentage of physical memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to perform more writes. This is read from /proc/sys/vm/dirty_ratio.	percent	cluster, rack	n/a
physical_memory_mapped	The total amount of memory which has been used to map devices, files, or libraries using the mmap command. This is the 'Mapped' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_memfree	The amount of physical memory left unused by the system. This is the 'MemFree' field from /proc/meminfo.	bytes	cluster, rack	n/a
physical_memory_total	The total physical memory available.	bytes	cluster, rack	n/a
physical_memory_used	The total amount of memory being used, excluding buffers and cache.	bytes	cluster, rack	n/a
physical_memory_writeback	The total amount of memory actively being written back to the disk. This is the 'Writeback' field from /proc/meminfo.	bytes	cluster, rack	n/a
smon_message_bytes_sent_rate	Number of bytes sent in messages from the Cloudera Manager Agent to the Cloudera Service Monitor	bytes per second	cluster, rack	n/a
smon_message_transmit_duration	The wall-clock time it took to transmit the most recent Cloudera Manager Agent message to the Cloudera Service Monitor	ms	cluster, rack	n/a
smon_message_transmit_failed_rate	Number of failures to send messages from the Cloudera Manager Agent to the Cloudera Service Monitor	messages per second	cluster, rack	n/a
smon_message_transmit_succeeded_rate	Number of messages successfully sent from the Cloudera Manager Agent to the Cloudera Service Monitor	messages per second	cluster, rack	n/a
supervisord_cpu_system_rate	Supervisord System CPU Time	seconds per second	cluster, rack	n/a
supervisord_cpu_user_rate	Supervisord User CPU Time	seconds per second	cluster, rack	n/a
supervisord_failures_rate	The number of failures contacting supervisord seen by the Cloudera Manager Agent	failures per second	cluster, rack	n/a
supervisord_fd_max	Supervisord File Descriptor Max	file descriptors	cluster, rack	n/a
supervisord_fd_open	Supervisord File Descriptors	file descriptors	cluster, rack	n/a
supervisord_latency	The average latency contacting supervisord seen by the Cloudera Manager Agent	seconds	cluster, rack	n/a
supervisord_physical_memory_used	Supervisord physical memory used	bytes	cluster, rack	n/a
supervisord_virtual_memory_used	Supervisord virtual memory used	bytes	cluster, rack	n/a
swap_free	Swap free	bytes	cluster, rack	n/a
swap_out_rate	Memory swapped out to disk	pages per second	cluster, rack	n/a
swap_total	Swap capacity	bytes	cluster, rack	n/a
swap_used	Swap used	bytes	cluster, rack	n/a
tcp_connection_count_close	The number of TCP connections in state CLOSE	connections	cluster, rack	n/a
tcp_connection_count_close_wait	The number of TCP connections in state CLOSE_WAIT	connections	cluster, rack	n/a
tcp_connection_count_closing	The number of TCP connections in state CLOSING	connections	cluster, rack	n/a
tcp_connection_count_established	The number of TCP connections in state ESTABLISHED	connections	cluster, rack	n/a
tcp_connection_count_fin_wait1	The number of TCP connections in state FIN_WAIT1	connections	cluster, rack	n/a
tcp_connection_count_fin_wait2	The number of TCP connections in state FIN_WAIT2	connections	cluster, rack	n/a
tcp_connection_count_last_ack	The number of TCP connections in state LAST_ACK	connections	cluster, rack	n/a
tcp_connection_count_listen	The number of TCP connections in state LISTEN	connections	cluster, rack	n/a
tcp_connection_count_syn_recv	The number of TCP connections in state SYN_RECV	connections	cluster, rack	n/a
tcp_connection_count_syn_sent	The number of TCP connections in state SYN_SENT	connections	cluster, rack	n/a
tcp_connection_count_time_wait	The number of TCP connections in state TIME_WAIT	connections	cluster, rack	n/a
uptime	For a host, the amount of time since the host was booted. For a role, the uptime of the backing process.	seconds	cluster, rack	n/a

HiveServer2 Metrics

Host Monitor Metrics