DataNode Metrics
In addition to these base metrics, many aggregate metrics are available. If an entity type has parents defined, you can formulate all possible aggregate metrics using the formula base_metric_across_parents.
In addition, metrics for aggregate totals can be formed by adding the prefix total_ to the front of the metric name.
Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".
For example, the following metric names may be valid for DataNode:
- alerts_rate_across_clusters
- total_alerts_rate_across_clusters
Some metrics, such as alerts_rate, apply to nearly every metric context. Others only apply to a certain service or role.
For more information about metrics, see Cloudera Manager Metrics and Metric Aggregation.
Metric Name | Description | Unit | Parents | CDH Version |
---|---|---|---|---|
alerts_rate | The number of alerts. | events per second | cluster, hdfs, rack | CDH 5, CDH 6 |
block_checksum_op_avg_time | Block Checksum Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
block_checksum_op_rate | Block Checksum Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
block_reports_avg_time | Block Reports Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
block_reports_rate | Block Reports Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
block_verification_failures_rate | Block Verification Failures | failures per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_cached_rate | The total number of HDFS blocks cached over the lifetime of the process. | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_get_local_path_info_rate | Blocks Get Local Path Info | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_read_rate | Blocks Read | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_removed_rate | Blocks Removed | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_replicated_rate | Blocks Replicated | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_total | Blocks total | blocks | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_uncached_rate | The total number of HDFS blocks uncached over the lifetime of the process. | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_verified_rate | Blocks Verified | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
blocks_written_rate | Blocks Written | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
bytes_read_rate | Number of bytes read | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
bytes_written_rate | Bytes Written | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cache_capacity | The capacity of the HDFS cache on this DataNode. | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
cache_reports_avg_time | The average time to generate cache reports on the DataNode. | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
cache_reports_rate | The total number of generate cache reports operations on the DataNode. | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cache_used | The total cache used. | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_cpu_system_rate | CPU usage of the role's cgroup | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_cpu_user_rate | User Space CPU usage of the role's cgroup | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_mem_page_cache | Page cache usage of the role's cgroup | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_mem_rss | Resident memory of the role's cgroup | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_mem_swap | Swap usage of the role's cgroup | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_read_bytes_rate | Bytes read from all disks by the role's cgroup | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_read_ios_rate | Number of read I/O operations from all disks by the role's cgroup | ios per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_write_bytes_rate | Bytes written to all disks by the role's cgroup | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cgroup_write_ios_rate | Number of write I/O operations to all disks by the role's cgroup | ios per second | cluster, hdfs, rack | CDH 5, CDH 6 |
copy_block_op_avg_time | Copy Block Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
copy_block_op_rate | Copy Block Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cpu_system_rate | Total System CPU | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
cpu_user_rate | Total CPU user time | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
datanode_namenode_connections_bad | NameNode connections in a bad state | connections | cluster, hdfs, rack | CDH 5, CDH 6 |
datanode_namenode_connections_good | NameNode connections in a good state | connections | cluster, hdfs, rack | CDH 5, CDH 6 |
datanode_namenode_connections_unknown | NameNode connections in a unknown state | connections | cluster, hdfs, rack | CDH 5, CDH 6 |
delete_block_pool_avg_time | Delete Block Pool Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
delete_block_pool_rate | Delete Block Pool Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
dfs_capacity | Total configured HDFS storage capacity | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
dfs_capacity_used | Storage space used by HDFS files | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
dfs_capacity_used_non_hdfs | Storage space used by non-HDFS files | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
events_critical_rate | The number of critical events. | events per second | cluster, hdfs, rack | CDH 5, CDH 6 |
events_important_rate | The number of important events. | events per second | cluster, hdfs, rack | CDH 5, CDH 6 |
events_informational_rate | The number of informational events. | events per second | cluster, hdfs, rack | CDH 5, CDH 6 |
fd_max | Maximum number of file descriptors | file descriptors | cluster, hdfs, rack | CDH 5, CDH 6 |
fd_open | Open file descriptors. | file descriptors | cluster, hdfs, rack | CDH 5, CDH 6 |
flush_nanos_avg_time | Average Disk Flush Time | nanos | cluster, hdfs, rack | CDH 5, CDH 6 |
flush_nanos_rate | Disk Flushes | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
fsync_nanos_avg_time | Average Disk Fsync Time | nanos | cluster, hdfs, rack | CDH 5, CDH 6 |
fsync_nanos_rate | Disk Fsyncs | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
fsync_rate | Fsync Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
gc_count_concurrent_mark_sweep_rate | The number of garbage collections by the Concurrent Mark Sweep Collector. | garbage collections per second | cluster, hdfs, rack | CDH 5, CDH 6 |
gc_count_par_new_rate | The number of garbage collections by the Parallel Collector. | garbage collections per second | cluster, hdfs, rack | CDH 5, CDH 6 |
gc_time_ms_concurrent_mark_sweep_rate | The total time spent in garbage collections by the Concurrent Mark Sweep Collector. | ms per second | cluster, hdfs, rack | CDH 5, CDH 6 |
gc_time_ms_par_new_rate | The total time spent in garbage collections by the Parallel Collector. | ms per second | cluster, hdfs, rack | CDH 5, CDH 6 |
get_block_local_path_info_avg_time | Get Block Local Path Info Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
get_block_local_path_info_rate | Get Block Local Path Info Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
get_hadoop_groups_avg_time | Average Time to get Hadoop group for the user | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
get_hadoop_groups_rate | Get Hadoop User Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
get_hdfs_blocks_metadata_avg_time | Get HDFS Blocks Metadata Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
get_hdfs_blocks_metadata_rate | Get HDFS Blocks Metadata Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
get_replica_visible_length_avg_time | Get Replica Visible Length Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
get_replica_visible_length_rate | Get Replica Visible Length Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
health_bad_rate | Percentage of Time with Bad Health | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
health_concerning_rate | Percentage of Time with Concerning Health | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
health_disabled_rate | Percentage of Time with Disabled Health | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
health_good_rate | Percentage of Time with Good Health | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
health_unknown_rate | Percentage of Time with Unknown Health | seconds per second | cluster, hdfs, rack | CDH 5, CDH 6 |
heartbeats_avg_time | Heartbeat Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
heartbeats_rate | Heartbeats | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
init_replica_recovery_avg_time | Init Replica Recovery Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
init_replica_recovery_rate | Init Replica Recovery Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_blocked_threads | Blocked threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_gc_rate | Number of garbage collections | garbage collections per second | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_gc_time_ms_rate | Total time spent garbage collecting. | ms per second | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_heap_committed_mb | Total amount of committed heap memory. | MB | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_heap_used_mb | Total amount of used heap memory. | MB | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_max_memory_mb | Maximum allowed memory. | MB | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_new_threads | New threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_non_heap_committed_mb | Total amount of committed non-heap memory. | MB | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_non_heap_used_mb | Total amount of used non-heap memory. | MB | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_pauses_info_threshold_exceeded_rate | Number of pauses detected over the info threshold. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 1 second, it treats it as one pause above the info threshold. | pauses per second | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_pauses_warn_threshold_exceeded_rate | Number of pauses detected over the warn threshold. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 10 seconds, it treats it as one pause above the warn threshold. | pauses per second | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_runnable_threads | Runnable threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_terminated_threads | Terminated threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_timed_waiting_threads | Timed waiting threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_total_threads | Total threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
jvm_waiting_threads | Waiting threads | threads | cluster, hdfs, rack | CDH 5, CDH 6 |
log_error_rate | Logged Errors | messages per second | cluster, hdfs, rack | CDH 5, CDH 6 |
log_fatal_rate | Logged Fatals | messages per second | cluster, hdfs, rack | CDH 5, CDH 6 |
log_info_rate | Logged Infos | messages per second | cluster, hdfs, rack | CDH 5, CDH 6 |
log_warn_rate | Logged Warnings | messages per second | cluster, hdfs, rack | CDH 5, CDH 6 |
login_failure_avg_time | Average Failed Login Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
login_failure_rate | Login Failures | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
login_success_avg_time | Average Successful Login Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
login_success_rate | Login Successes | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
mem_rss | Resident memory used | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
mem_swap | Amount of swap memory used by this role's process. | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
mem_virtual | Virtual memory used | bytes | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_dropped_pub_all | Dropped Metrics Updates By All Sinks | updates | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_num_active_sinks | Active Metrics Sinks Count | sinks | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_num_active_sources | Active Metrics Sources Count | sources | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_num_all_sinks | All Metrics Sinks Count | sinks | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_num_all_sources | All Metrics Sources Count | sources | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_publish_avg_time | Metrics Publish Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_publish_rate | Metrics Publish Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_snapshot_avg_time | Metrics Snapshot Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
metrics_snapshot_rate | Metrics Snapshot Average Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
num_blocks_failed_to_cache_rate | The total number of blocks the DataNode failed to cache. | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
num_blocks_failed_to_uncache_rate | The total number of blocks the DataNode failed to uncache. | blocks per second | cluster, hdfs, rack | CDH 5, CDH 6 |
oom_exits_rate | The number of times the role's backing process was killed due to an OutOfMemory error. This counter is only incremented if the Cloudera Manager "Kill When Out of Memory" option is enabled. | exits per second | cluster, hdfs, rack | CDH 5, CDH 6 |
packet_ack_round_trip_time_nanos_avg_time | Packet Ack Round Trip Average Time | nanos | cluster, hdfs, rack | CDH 5, CDH 6 |
packet_ack_round_trip_time_nanos_rate | Packet Ack Round Trip Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
pause_time_rate | Total time spent paused. This is the total extra time the pause monitor thread spent sleeping on top of the requested 500 ms. | ms per second | cluster, hdfs, rack | CDH 5, CDH 6 |
pauses_rate | Number of pauses detected. The pause monitor thread sleeps for 500 ms and calculates the extra time it spent paused on top of the sleep time. If the extra sleep time exceeds 1 second, it treats it as one pause. | pauses per second | cluster, hdfs, rack | CDH 5, CDH 6 |
read_block_op_avg_time | Read Block Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
read_block_op_rate | Read Block Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
read_bytes_rate | The number of bytes read from the device | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
reads_from_local_client_rate | Reads From Local Clients | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
reads_from_remote_client_rate | Reads From Remote Clients | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
refresh_namenodes_avg_time | Refresh NameNodes Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
refresh_namenodes_rate | Refresh NameNodes Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
replace_block_op_avg_time | Replace Block Operation Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
replace_block_op_rate | Replace Block Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_authentication_failures_rate | RPC Authentication Failures | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_authentication_successes_rate | RPC Authentication Successes | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_authorization_failures_rate | RPC Authorization Failures | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_authorization_successes_rate | RPC Authorization Successes | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_call_queue_length | RPC Call Queue Length | items | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_num_open_connections | Open RPC Connections | connections | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_processing_time_avg_time | Average RPC Processing Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_processing_time_rate | RPCs Processed | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_queue_time_avg_time | Average RPC Queue Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_queue_time_rate | RPCs Queued | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_received_bytes_rate | RPC Received Bytes | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
rpc_sent_bytes_rate | RPC Sent Bytes | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
send_data_packet_blocked_on_network_nanos_avg_time | Send Data Packet Blocked On Network Average Time | nanos | cluster, hdfs, rack | CDH 5, CDH 6 |
send_data_packet_blocked_on_network_nanos_rate | Send Data Packet Blocked On Network Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
send_data_packet_transfer_nanos_avg_time | Send Data Packet Transfer Average Time | nanos | cluster, hdfs, rack | CDH 5, CDH 6 |
send_data_packet_transfer_nanos_rate | Send Data Packet Transfer Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
unexpected_exits_rate | The number of times the role's backing process exited unexpectedly. | exits per second | cluster, hdfs, rack | CDH 5, CDH 6 |
update_replica_under_recovery_avg_time | Update Replica Under Recovery Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
update_replica_under_recovery_rate | Update Replica Under Recovery Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
uptime | For a host, the amount of time since the host was booted. For a role, the uptime of the backing process. | seconds | cluster, hdfs, rack | CDH 5, CDH 6 |
volume_failures | Volume failures | volumes | cluster, hdfs, rack | CDH 5, CDH 6 |
web_metrics_collection_duration | Web Server Responsiveness | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
write_block_op_avg_time | Write Block Average Time | ms | cluster, hdfs, rack | CDH 5, CDH 6 |
write_block_op_rate | Write Block Operations | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
write_bytes_rate | The number of bytes written to the device | bytes per second | cluster, hdfs, rack | CDH 5, CDH 6 |
writes_from_local_client_rate | Writes From Local Clients | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
writes_from_remote_client_rate | Writes From Remote Clients | operations per second | cluster, hdfs, rack | CDH 5, CDH 6 |
xceivers | Transceivers | transceivers | cluster, hdfs, rack | CDH 5, CDH 6 |
ec_decoding_time_rate | Time spent in decoding during erasure coding | nanos per second | cluster, hdfs, rack | CDH 6 |
ec_failed_reconstruction_tasks_rate | Number of failed reconstruction tasks during erasure coding | tasks per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_bytes_read_rate | Number of bytes read during erasure coding reconstruction | bytes per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_bytes_written_rate | Number of bytes written during erasure coding reconstruction | bytes per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_decoding_time_rate | Decoding time spent during erasure coding reconstruction | ms per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_read_time_rate | Read time spent during erasure coding reconstruction | ms per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_remote_bytes_read_rate | Number of remote bytes read during erasure coding reconstruction | bytes per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_tasks_rate | Number of reconstruction tasks during erasure coding | tasks per second | cluster, hdfs, rack | CDH 6 |
ec_reconstruction_write_time_rate | Write time spent during erasure coding reconstruction | ms per second | cluster, hdfs, rack | CDH 6 |