Essential metrics to monitor
Cloudera Manager collects a high number of performance metrics for the Kafka services running on your clusters. Certain metrics should be monitored in any Kafka deployment as they can help you to improve the stability and performance of your Kafka deployment.
The following tables collect the Kafka broker metrics that Cloudera recommends you to monitor in any Kafka deployment. For more information on metrics, including a full list of Kafka metrics, see Cloudera Manager Metrics.
| Metric Name | Description | Unit | Importance | Parents | Version Availability |
|---|---|---|---|---|---|
kafka_zookeeper_expires_rate |
Measures the session expires per second. | Expires per second | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_zookeeper_request_latency_avg |
Request latency between the broker and Zookeeper. | ms | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
| Metric Name | Description | Unit | Importance | Parents | Version Availability |
|---|---|---|---|---|---|
kafka_active_controller |
Shows the number of active controllers at a given time. Ideally it should be 1. | Number of controllers | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
| Metric Name | Description | Unit | Importance | Parents | Version Availability |
|---|---|---|---|---|---|
kafka_network_processor_avg_idle |
The average free capacity of the network processors. Should be > 0.3. | Percentage of free capacity | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_request_queue_size |
Size of the request queue in Kafka. | Message count | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_response_queue_size |
Size of the response queue in Kafka. | Message count | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_messages_received_rate |
Number of messages written to topic on this broker. | Messages per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
| Metric Name | Description | Unit | Importance | Parents | Version Availability |
|---|---|---|---|---|---|
kafka_produce_local_time_rate |
Local Time spent in responding to Produce requests. | Requests per second | Low | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_log_flush_rate |
Rate of flushing Kafka logs to disk. | Flushes per second | Low | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_request_handler_avg_idle_rate |
The average free capacity of the request handler. Should be > 0.3. | Percentage of free capacity | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
| Metric Name | Description | Unit | Importance | Parents | Version Availability |
|---|---|---|---|---|---|
kafka_broker_state |
The state the broker is in. 0 = NotRunning, 1 = Starting, 2 = RecoveringFromUncleanShutdown, 3 = RunningAsBroker, 4 = RunningAsController, 6 = PendingControlledShutdown, 7 = BrokerShuttingDown | Discrete states | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_jvm_gc_runs_rate |
Number of garbage collector runs performed on this broker. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_isr_expands_rate |
ISR expands per second. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_isr_shrinks_rate |
ISR shrinks per second. | Events per second | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_max_replication_lag |
Maximum replication lag on the broker, across all fetchers, topics, and partitions. | Messages | Medium | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_offline_partitions |
Number of offline partitions. | Partition count | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_under_min_isr_partition_count |
Count of partitions with less than the configured minimum in-sync replicas available. | Partition count | High | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
kafka_under_replicated_partitions |
Count of partitions with unavailable replicas. | Partition count | Low | cluster, kafka, rack | CDH 5, CDH 6, CDP 7 |
