Cloudera Manager collects a high number of performance metrics for the Kafka services
running on your clusters. Certain metrics should be monitored in any Kafka deployment as they
can help you to improve the stability and performance of your Kafka deployment.
The following tables collect the Kafka broker metrics that Cloudera recommends you to monitor
in any Kafka deployment. For more information on metrics, including a full list of Kafka
metrics, see Cloudera Manager metrics in the Cloudera Manager
Reference documentation.
Table 1. ZooKeeper connectivity metrics
Metric Name
Description
Unit
Importance
Parents
Version Availability
kafka_zookeeper_expires_rate
Measures the session expires per second.
Expires per second
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_zookeeper_request_latency_avg
Request latency between the broker and Zookeeper.
ms
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
Table 2. Active controller metrics
Metric Name
Description
Unit
Importance
Parents
Version Availability
kafka_active_controller
Shows the number of active controllers at a given time. Ideally it
should be 1.
Number of controllers
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
Table 3. Network metrics
Metric Name
Description
Unit
Importance
Parents
Version Availability
kafka_network_processor_avg_idle
The average free capacity of the network processors. Should be > 0.3.
Percentage of free capacity
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_request_queue_size
Size of the request queue in Kafka.
Message count
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_response_queue_size
Size of the response queue in Kafka.
Message count
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_messages_received_rate
Number of messages written to topic on this broker.
Messages per second
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
Table 4. Disk utilization metrics
Metric Name
Description
Unit
Importance
Parents
Version Availability
kafka_produce_local_time_rate
Local Time spent in responding to Produce requests.
Requests per second
Low
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_log_flush_rate
Rate of flushing Kafka logs to disk.
Flushes per second
Low
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_request_handler_avg_idle_rate
The average free capacity of the request handler. Should be > 0.3.
Percentage of free capacity
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
Table 5. Kafka metrics
Metric Name
Description
Unit
Importance
Parents
Version Availability
kafka_broker_state
The state the broker is in. 0 = NotRunning, 1 = Starting, 2 =
RecoveringFromUncleanShutdown, 3 = RunningAsBroker, 4 = RunningAsController, 6 =
PendingControlledShutdown, 7 = BrokerShuttingDown
Discrete states
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_jvm_gc_runs_rate
Number of garbage collector runs performed on this broker.
Events per second
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_isr_expands_rate
ISR expands per second.
Events per second
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_isr_shrinks_rate
ISR shrinks per second.
Events per second
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_max_replication_lag
Maximum replication lag on the broker, across all fetchers, topics, and
partitions.
Messages
Medium
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_offline_partitions
Number of offline partitions.
Partition count
High
cluster, kafka, rack
CDH 5, CDH 6, CDP 7
kafka_under_min_isr_partition_count
Count of partitions with less than the configured minimum in-sync
replicas available.