Essential metrics to monitor

Cloudera Manager collects a high number of performance metrics for the Kafka services running on your clusters. Certain metrics should be monitored in any Kafka deployment as they can help you to improve the stability and performance of your Kafka deployment.

The following tables collect the Kafka broker metrics that Cloudera recommends you to monitor in any Kafka deployment. For more information on metrics, including a full list of Kafka metrics, see Cloudera Manager Metrics.

Table 1. ZooKeeper connectivity metrics
Metric Name Description Unit Importance Parents Version Availability
kafka_zookeeper_expires_rate Measures the session expires per second. Expires per second High cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_zookeeper_request_latency_avg Request latency between the broker and Zookeeper. ms High cluster, kafka, rack CDH 5, CDH 6, CDP 7
Table 2. Active controller metrics
Metric Name Description Unit Importance Parents Version Availability
kafka_active_controller Shows the number of active controllers at a given time. Ideally it should be 1. Number of controllers High cluster, kafka, rack CDH 5, CDH 6, CDP 7
Table 3. Network metrics
Metric Name Description Unit Importance Parents Version Availability
kafka_network_processor_avg_idle The average free capacity of the network processors. Should be > 0.3. Percentage of free capacity High cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_request_queue_size Size of the request queue in Kafka. Message count Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_response_queue_size Size of the response queue in Kafka. Message count Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_messages_received_rate Number of messages written to topic on this broker. Messages per second Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
Table 4. Disk utilization metrics
Metric Name Description Unit Importance Parents Version Availability
kafka_produce_local_time_rate Local Time spent in responding to Produce requests. Requests per second Low cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_log_flush_rate Rate of flushing Kafka logs to disk. Flushes per second Low cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_request_handler_avg_idle_rate The average free capacity of the request handler. Should be > 0.3. Percentage of free capacity High cluster, kafka, rack CDH 5, CDH 6, CDP 7
Table 5. Kafka metrics
Metric Name Description Unit Importance Parents Version Availability
kafka_broker_state The state the broker is in. 0 = NotRunning, 1 = Starting, 2 = RecoveringFromUncleanShutdown, 3 = RunningAsBroker, 4 = RunningAsController, 6 = PendingControlledShutdown, 7 = BrokerShuttingDown Discrete states Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_jvm_gc_runs_rate Number of garbage collector runs performed on this broker. Events per second Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_isr_expands_rate ISR expands per second. Events per second Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_isr_shrinks_rate ISR shrinks per second. Events per second Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_max_replication_lag Maximum replication lag on the broker, across all fetchers, topics, and partitions. Messages Medium cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_offline_partitions Number of offline partitions. Partition count High cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_under_min_isr_partition_count Count of partitions with less than the configured minimum in-sync replicas available. Partition count High cluster, kafka, rack CDH 5, CDH 6, CDP 7
kafka_under_replicated_partitions Count of partitions with unavailable replicas. Partition count Low cluster, kafka, rack CDH 5, CDH 6, CDP 7