SRM Service Metrics
Reference information for SRM Service Metrics
In addition to these base metrics, many aggregate metrics are available.
If an entity type has parents defined, you can formulate all possible
aggregate metrics using the formula
base_metric_across_parents
.
In addition, metrics for aggregate totals can be formed by adding the prefix
total_
to the front of the metric name.
Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case the plural form does not end in "s".
For example, the following metric names may be valid for SRM Service:
-
alerts_rate_across_clusters
-
total_alerts_rate_across_clusters
Some metrics, such as alerts_rate
, apply to nearly every metric context. Others only apply to a
certain service or role.
alerts_rate
- Description
- The number of alerts.
- Unit
- events per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_cpu_system_rate
- Description
- CPU usage of the role's cgroup
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_cpu_user_rate
- Description
- User Space CPU usage of the role's cgroup
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_mem_page_cache
- Description
- Page cache usage of the role's cgroup
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_mem_rss
- Description
- Resident memory of the role's cgroup
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_mem_swap
- Description
- Swap usage of the role's cgroup
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_read_bytes_rate
- Description
- Bytes read from all disks by the role's cgroup
- Unit
- bytes per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_read_ios_rate
- Description
- Number of read I/O operations from all disks by the role's cgroup
- Unit
- ios per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_write_bytes_rate
- Description
- Bytes written to all disks by the role's cgroup
- Unit
- bytes per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cgroup_write_ios_rate
- Description
- Number of write I/O operations to all disks by the role's cgroup
- Unit
- ios per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cpu_system_rate
- Description
- Total System CPU
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
cpu_user_rate
- Description
- Total CPU user time
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
events_critical_rate
- Description
- The number of critical events.
- Unit
- events per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
events_important_rate
- Description
- The number of important events.
- Unit
- events per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
events_informational_rate
- Description
- The number of informational events.
- Unit
- events per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
fd_max
- Description
- Maximum number of file descriptors
- Unit
- file descriptors
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
fd_open
- Description
- Open file descriptors.
- Unit
- file descriptors
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
health_bad_rate
- Description
- Percentage of Time with Bad Health
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
health_concerning_rate
- Description
- Percentage of Time with Concerning Health
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
health_disabled_rate
- Description
- Percentage of Time with Disabled Health
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
health_good_rate
- Description
- Percentage of Time with Good Health
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
health_unknown_rate
- Description
- Percentage of Time with Unknown Health
- Unit
- seconds per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
mem_rss
- Description
- Resident memory used
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
mem_swap
- Description
- Amount of swap memory used by this role's process.
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
mem_virtual
- Description
- Virtual memory used
- Unit
- bytes
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
oom_exits_rate
- Description
- The number of times the role's backing process was killed due to an OutOfMemory error. This counter is only incremented if the Cloudera Manager "Kill When Out of Memory" option is enabled.
- Unit
- exits per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
read_bytes_rate
- Description
- The number of bytes read from the device
- Unit
- bytes per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
unexpected_exits_rate
- Description
- The number of times the role's backing process exited unexpectedly.
- Unit
- exits per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
uptime
- Description
- For a host, the amount of time since the host was booted. For a role, the uptime of the backing process.
- Unit
- seconds
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
write_bytes_rate
- Description
- The number of bytes written to the device
- Unit
- bytes per second
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 6.0.0), [CDH 6.0.0..CDH 7.0.0), [CDH 7.0.0..CDH 8.0.0), [CM -1.0.0..CM -1.0.0]
streams_replication_manager_metrics_processor_status_code
- Description
- The status code of the SRM service metrics processor. 0: HEALTHY, 1: INITIALIZING_METRICS_PROCESSOR, 2: RESTARTING_METRICS_PROCESSOR
- Unit
- message.units.status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_remote_service_discovery_endpoint_group_aggregated_status_code
- Description
- The aggregated status codes of the remote SRM Service discovery endpoint groups. These endpoint groups represent the discovered remote SRM Service clusters. For an endpoint group to be available, it needs to have at least 1 active member, and all members should advertise the same protocol. 0: all endpoint groups are available, non-zero: one or more endpoint groups are not available
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_remote_service_discovery_endpoint_group_health_check_aggregated_status_code
- Description
- The aggregated status codes of the remote SRM Service discovery endpoint groups validated with health checks. These endpoint groups represent the discovered remote SRM Service clusters, with their members being health-checked. For an endpoint group to be available, it needs to have at least 1 healthy member. 0: all endpoint groups are available, non-zero: one or more endpoint groups are not available
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_remote_service_discovery_topic_consumer_aggregated_status_code
- Description
- The aggregated status codes of the remote SRM Service discovery topic consumers. These consumers connect to the remote target Kafka clusters listed in the 'Streams Replication Manager Service Remote Target Clusters' configuration of SRM Service. 0: all consumers are connected to their corresponding remote target Kafka cluster, non-zero: one or more consumers are not connected
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_target_metrics_processor_aggregated_status_code
- Description
- The aggregated status code of the SRM service metrics processors. These metrics processors connect to the target Kafka clusters listed in the 'Streams Replication Manager Service Target Clusters' 0: all metrics processors are connected to a target Kafka cluster and working as expected, non-zero: one or more metrics processors are either restarting or initializing
- Unit
- message.units.status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_target_metrics_streams_application_kafka_connection_aggregated_status_code
- Description
- The aggregated status code of the SRM service metrics Streams Applications. These Streams Applications connect to the target Kafka clusters listed in the 'Streams Replication Manager Service Target Clusters' 0: all metrics Streams Application are connected to a target Kafka cluster, non-zero: one or more metrics Streams Application are not connected
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_service_target_service_discovery_heartbeat_producer_aggregated_status_code
- Description
- The aggregated status codes of the service discovery heartbeat producers. These producers connect to the target Kafka clusters listed in the 'Streams Replication Manager Service Target Clusters' configuration of SRM Service. 0: all producers are connected to their corresponding target Kafka cluster, non-zero: one or more producers are not connected
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)
streams_replication_manager_streams_kafka_connection_status_code
- Description
- The status code of the Streams App Kafka Connection. 0: CONNECTED, 1: DISCONNECTED
- Unit
- message.units.connection_status_code
- Parents
- cluster, rack, streams_replication_manager
- CDH Version
- [CDH 5.0.0..CDH 8.0.0)