DataNode Metrics

Metric Name Description Unit CDH Version
alerts_rate The number of alerts. events per second CDH 5
block_checksum_op_avg_time Block Checksum Average Time ms CDH 5
block_checksum_op_rate Block Checksum Operations operations per second CDH 5
block_reports_avg_time Block Reports Average Time ms CDH 5
block_reports_rate Block Reports Operations operations per second CDH 5
block_verification_failures_rate Block Verification Failures failures per second CDH 5
blocks_get_local_path_info_rate Blocks Get Local Path Info operations per second CDH 5
blocks_read_rate Blocks Read blocks per second CDH 5
blocks_removed_rate Blocks Removed blocks per second CDH 5
blocks_replicated_rate Blocks Replicated blocks per second CDH 5
blocks_total Blocks total blocks CDH 5
blocks_verified_rate Blocks Verified blocks per second CDH 5
blocks_written_rate Blocks Written blocks per second CDH 5
bytes_read_rate Number of bytes read bytes per second CDH 5
bytes_written_rate Bytes Written bytes per second CDH 5
cgroup_cpu_system_rate CPU usage of the role's cgroup seconds per second CDH 5
cgroup_cpu_user_rate User Space CPU usage of the role's cgroup seconds per second CDH 5
cgroup_mem_page_cache Page cache usage of the role's cgroup bytes CDH 5
cgroup_mem_rss Resident memory of the role's cgroup bytes CDH 5
cgroup_mem_swap Swap usage of the role's cgroup bytes CDH 5
cgroup_read_bytes_rate Bytes read from all disks by the role's cgroup bytes per second CDH 5
cgroup_read_ios_rate Number of read I/O operations from all disks by the role's cgroup ios per second CDH 5
cgroup_write_bytes_rate Bytes written to all disks by the role's cgroup bytes per second CDH 5
cgroup_write_ios_rate Number of write I/O operations to all disks by the role's cgroup ios per second CDH 5
copy_block_op_avg_time Copy Block Average Time ms CDH 5
copy_block_op_rate Copy Block Operations operations per second CDH 5
cpu_system_rate Total System CPU seconds per second CDH 5
cpu_user_rate Total CPU user time seconds per second CDH 5
datanode_namenode_connections_bad NameNode connections in a bad state connections CDH 5
datanode_namenode_connections_good NameNode connections in a good state connections CDH 5
datanode_namenode_connections_unknown NameNode connections in a unknown state connections CDH 5
delete_block_pool_avg_time Delete Block Pool Average Time ms CDH 5
delete_block_pool_rate Delete Block Pool Operations operations per second CDH 5
dfs_capacity Total configured HDFS storage capacity bytes CDH 5
dfs_capacity_used Storage space used by HDFS files bytes CDH 5
dfs_capacity_used_non_hdfs Storage space used by non-HDFS files bytes CDH 5
events_critical_rate The number of critical events. events per second CDH 5
events_important_rate The number of important events. events per second CDH 5
events_informational_rate The number of informational events. events per second CDH 5
fd_max Maximum number of file descriptors file descriptors CDH 5
fd_open Open file descriptors. file descriptors CDH 5
flush_nanos_avg_time Average Disk Flush Time nanos CDH 5
flush_nanos_rate Disk Flushes operations per second CDH 5
fsync_nanos_avg_time Average Disk Fsync Time nanos CDH 5
fsync_nanos_rate Disk Fsyncs operations per second CDH 5
fsync_rate Fsync Operations operations per second CDH 5
get_block_local_path_info_avg_time Get Block Local Path Info Average Time ms CDH 5
get_block_local_path_info_rate Get Block Local Path Info Operations operations per second CDH 5
get_hadoop_groups_avg_time Average Time to get Hadoop group for the user ms CDH 5
get_hadoop_groups_rate Get Hadoop User Operations operations per second CDH 5
get_hdfs_blocks_metadata_avg_time Get HDFS Blocks Metadata Average Time ms CDH 5
get_hdfs_blocks_metadata_rate Get HDFS Blocks Metadata Operations operations per second CDH 5
get_replica_visible_length_avg_time Get Replica Visible Length Average Time ms CDH 5
get_replica_visible_length_rate Get Replica Visible Length Operations operations per second CDH 5
health_bad_rate Percentage of Time with Bad Health seconds per second CDH 5
health_concerning_rate Percentage of Time with Concerning Health seconds per second CDH 5
health_disabled_rate Percentage of Time with Disabled Health seconds per second CDH 5
health_good_rate Percentage of Time with Good Health seconds per second CDH 5
health_unknown_rate Percentage of Time with Unknown Health seconds per second CDH 5
heartbeats_avg_time Heartbeat Average Time ms CDH 5
heartbeats_rate Heartbeats operations per second CDH 5
init_replica_recovery_avg_time Init Replica Recovery Average Time ms CDH 5
init_replica_recovery_rate Init Replica Recovery Operations operations per second CDH 5
jvm_blocked_threads Blocked threads threads CDH 5
jvm_gc_rate Number of garbage collections garbage collections per second CDH 5
jvm_gc_time_ms_rate Total time spent garbage collecting. ms per second CDH 5
jvm_heap_committed_mb Total amount of committed heap memory. MB CDH 5
jvm_heap_used_mb Total amount of used heap memory. MB CDH 5
jvm_max_memory_mb Maximum allowed memory. MB CDH 5
jvm_new_threads New threads threads CDH 5
jvm_non_heap_committed_mb Total amount of committed non-heap memory. MB CDH 5
jvm_non_heap_used_mb Total amount of used non-heap memory. MB CDH 5
jvm_runnable_threads Runnable threads threads CDH 5
jvm_terminated_threads Terminated threads threads CDH 5
jvm_timed_waiting_threads Timed waiting threads threads CDH 5
jvm_total_threads Total threads threads CDH 5
jvm_waiting_threads Waiting threads threads CDH 5
log_error_rate Logged Errors messages per second CDH 5
log_fatal_rate Logged Fatals messages per second CDH 5
log_info_rate Logged Infos messages per second CDH 5
log_warn_rate Logged Warnings messages per second CDH 5
login_failure_avg_time Average Failed Login Time ms CDH 5
login_failure_rate Login Failures operations per second CDH 5
login_success_avg_time Average Successful Login Time ms CDH 5
login_success_rate Login Successes operations per second CDH 5
mem_rss Resident memory used bytes CDH 5
mem_swap Amount of swap memory used by this role's process. bytes CDH 5
mem_virtual Virtual memory used bytes CDH 5
metrics_dropped_pub_all Dropped Metrics Updates By All Sinks updates CDH 5
metrics_num_active_sinks Active Metrics Sinks Count sinks CDH 5
metrics_num_active_sources Active Metrics Sources Count sources CDH 5
metrics_num_all_sinks All Metrics Sinks Count sinks CDH 5
metrics_num_all_sources All Metrics Sources Count sources CDH 5
metrics_publish_avg_time Metrics Publish Average Time ms CDH 5
metrics_publish_rate Metrics Publish Operations operations per second CDH 5
metrics_snapshot_avg_time Metrics Snapshot Average Time ms CDH 5
metrics_snapshot_rate Metrics Snapshot Average Operations operations per second CDH 5
oom_exits_rate The number of times the role's backing process was killed due to an OutOfMemory error. This counter is only incremented if the Cloudera Manager "Kill When Out of Memory" option is enabled. exits per second CDH 5
packet_ack_round_trip_time_nanos_avg_time Packet Ack Round Trip Average Time nanos CDH 5
packet_ack_round_trip_time_nanos_rate Packet Ack Round Trip Operations operations per second CDH 5
read_block_op_avg_time Read Block Average Time ms CDH 5
read_block_op_rate Read Block Operations operations per second CDH 5
read_bytes_rate The number of bytes read from the device bytes per second CDH 5
reads_from_local_client_rate Reads From Local Clients operations per second CDH 5
reads_from_remote_client_rate Reads From Remote Clients operations per second CDH 5
refresh_namenodes_avg_time Refresh NameNodes Average Time ms CDH 5
refresh_namenodes_rate Refresh NameNodes Operations operations per second CDH 5
replace_block_op_avg_time Replace Block Operation Average Time ms CDH 5
replace_block_op_rate Replace Block Operations operations per second CDH 5
rpc_authentication_failures_rate RPC Authentication Failures operations per second CDH 5
rpc_authentication_successes_rate RPC Authentication Successes operations per second CDH 5
rpc_authorization_failures_rate RPC Authorization Failures operations per second CDH 5
rpc_authorization_successes_rate RPC Authorization Successes operations per second CDH 5
rpc_call_queue_length RPC Call Queue Length items CDH 5
rpc_num_open_connections Open RPC Connections connections CDH 5
rpc_processing_time_avg_time Average RPC Processing Time ms CDH 5
rpc_processing_time_rate RPCs Processed operations per second CDH 5
rpc_queue_time_avg_time Average RPC Queue Time ms CDH 5
rpc_queue_time_rate RPCs Queued operations per second CDH 5
rpc_received_bytes_rate RPC Received Bytes bytes per second CDH 5
rpc_sent_bytes_rate RPC Sent Bytes bytes per second CDH 5
send_data_packet_blocked_on_network_nanos_avg_time Send Data Packet Blocked On Network Average Time nanos CDH 5
send_data_packet_blocked_on_network_nanos_rate Send Data Packet Blocked On Network Operations operations per second CDH 5
send_data_packet_transfer_nanos_avg_time Send Data Packet Transfer Average Time nanos CDH 5
send_data_packet_transfer_nanos_rate Send Data Packet Transfer Operations operations per second CDH 5
unexpected_exits_rate The number of times the role's backing process exited unexpectedly. exits per second CDH 5
update_replica_under_recovery_avg_time Update Replica Under Recovery Average Time ms CDH 5
update_replica_under_recovery_rate Update Replica Under Recovery Operations operations per second CDH 5
uptime For a host, the amount of time since the host was booted. For a role, the uptime of the backing process. seconds CDH 5
volume_failures Volume failures volumes CDH 5
web_metrics_collection_duration Web Server Responsiveness ms CDH 5
write_block_op_avg_time Write Block Average Time ms CDH 5
write_block_op_rate Write Block Operations operations per second CDH 5
write_bytes_rate The number of bytes written to the device bytes per second CDH 5
writes_from_local_client_rate Writes From Local Clients operations per second CDH 5
writes_from_remote_client_rate Writes From Remote Clients operations per second CDH 5
xceivers Transceivers transceivers CDH 5
blocks_cached_rate The total number of HDFS blocks cached over the lifetime of the process. blocks per second CDH 5
blocks_uncached_rate The total number of HDFS blocks uncached over the lifetime of the process. blocks per second CDH 5
cache_capacity The capacity of the HDFS cache on this DataNode. bytes CDH 5
cache_reports_avg_time The average time to generate cache reports on the DataNode. ms CDH 5
cache_reports_rate The total number of generate cache reports operations on the DataNode. operations per second CDH 5
cache_used The total cache used. bytes CDH 5
gc_count_concurrent_mark_sweep_rate The number of garbage collections by the Concurrent Mark Sweep Collector. garbage collections per second CDH 5
gc_count_par_new_rate The number of garbage collections by the Parallel Collector. garbage collections per second CDH 5
gc_time_ms_concurrent_mark_sweep_rate The total time spent in garbage collections by the Concurrent Mark Sweep Collector. ms per second CDH 5
gc_time_ms_par_new_rate The total time spent in garbage collections by the Parallel Collector. ms per second CDH 5
jvm_pauses_info_threshold_exceeded_rate Number of pauses detected over the info threshold. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 1 second, it treats it as one pause above the info threshold. pauses per second CDH 5
jvm_pauses_warn_threshold_exceeded_rate Number of pauses detected over the warn threshold. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 10 seconds, it treats it as one pause above the warn threshold. pauses per second CDH 5
num_blocks_failed_to_cache_rate The total number of blocks the DataNode failed to cache. blocks per second CDH 5
num_blocks_failed_to_uncache_rate The total number of blocks the DataNode failed to uncache. blocks per second CDH 5
pause_time_rate Total time spent paused. This is the total extra time the pause monitor thread spent sleeping on top of the requested 500 ms. ms per second CDH 5
pauses_rate Number of pauses detected. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 1 second, it treats it as one pause. pauses per second CDH 5