Essential metrics to monitor
Cloudera Manager collects a high number of performance metrics for the Kafka services running on your clusters. Certain metrics should be monitored in any Kafka deployment as they can help you to improve the stability and performance of your Kafka deployment.
The following tables collect the Kafka broker metrics that Cloudera recommends you to monitor in any Kafka deployment. For more information on metrics, including a full list of Kafka metrics, see Cloudera Manager Metrics.
Metric Name | Description | Unit | Importance | Parents | Version Availability |
---|---|---|---|---|---|
kafka_zookeeper_expires_rate |
Measures the session expires per second. | Expires per second | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_zookeeper_request_latency_avg |
Request latency between the broker and Zookeeper. | ms | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
Metric Name | Description | Unit | Importance | Parents | Version Availability |
---|---|---|---|---|---|
kafka_active_controller |
Shows the number of active controllers at a given time. Ideally it should be 1. | Number of controllers | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
Metric Name | Description | Unit | Importance | Parents | Version Availability |
---|---|---|---|---|---|
kafka_network_processor_avg_idle |
The average free capacity of the network processors. Should be > 0.3. | Percentage of free capacity | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_request_queue_size |
Size of the request queue in Kafka. | Message count | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_response_queue_size |
Size of the response queue in Kafka. | Message count | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_messages_received_rate |
Number of messages written to topic on this broker. | Messages per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
Metric Name | Description | Unit | Importance | Parents | Version Availability |
---|---|---|---|---|---|
kafka_produce_local_time_rate |
Local Time spent in responding to Produce requests. | Requests per second | Low | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_log_flush_rate |
Rate of flushing Kafka logs to disk. | Flushes per second | Low | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_request_handler_avg_idle_rate |
The average free capacity of the request handler. Should be > 0.3. | Percentage of free capacity | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
Metric Name | Description | Unit | Importance | Parents | Version Availability |
---|---|---|---|---|---|
kafka_broker_state |
The state the broker is in. 0 = NotRunning, 1 = Starting, 2 = RecoveringFromUncleanShutdown, 3 = RunningAsBroker, 4 = RunningAsController, 6 = PendingControlledShutdown, 7 = BrokerShuttingDown | Discrete states | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_jvm_gc_runs_rate |
Number of garbage collector runs performed on this broker. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_isr_expands_rate |
ISR expands per second. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_isr_shrinks_rate |
ISR shrinks per second. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_max_replication_lag |
Maximum replication lag on the broker, across all fetchers, topics, and partitions. | Messages | Medium | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_offline_partitions |
Number of offline partitions. | Partition count | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_under_min_isr_partition_count |
Count of partitions with less than the configured minimum in-sync replicas available. | Partition count | High | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |
kafka_under_replicated_partitions |
Count of partitions with unavailable replicas. | Partition count | Low | cluster, kafka, rack | CDH 5, CDH 6, Cloudera Data Platform 7 |