Component types and metrics for alert policies

You create an alert policy for a component type. The component type drives the list of metrics to select for creating a threshold. Learn the different component types and supported metrics for each component type.

The following table lists the component types and metrics for an alert policy:

Table 1. Component Types and Metrics
Metric	Description	Suggested Alert
Topic
UNDER REPLICATED PARTITIONS COUNT	Total number of partitions that are under replicated for a topic.	Value > 0.
BYTES IN PER SEC	Bytes per second coming in to a topic.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the topic becomes idle. Alert-2: Value > max_bytes_in_expected, raises an alert when the topic input load is higher than usual.
BYTES OUT PER SEC	Bytes per second going out from a topic. It does not count the internal replication traffic.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the topic becomes idle. Alert-2: Value > max_bytes_out_expected, raises an alert when the topic output load is higher than usual.
OUT OF SYNC REPLICA COUNT	Total number of replicas that are not in sync with the leader for a topic.	Value > 0, raises an alert if there are out of sync replicas for the topic.
TOPIC PARTITION CONSUMPTION PERCENTAGE	Percentage of bytes consumed per topic partition compared according to the configured parameter `retention.bytes`. If `retention.bytes` is not configured, any condition involving this metric would be false.	Value > max_expected_value, raises an alert if the topic partition reaches a certain consumption percentage.
TOPIC PARTITION BYTES IN PER SEC	Bytes per second coming in to a topic partition.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the topic partition becomes idle. Alert-2: Value > max_bytes_in_expected, raises an alert when the topic partition input load is higher than usual.
TOPIC PARTITION BYTES OUT PER SEC	Bytes per second coming out of a topic partition.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the topic partition becomes idle. Alert-2: Value > max_bytes_out_expected, raises an alert when the topic partition output load is higher than usual.
Producer
IS PRODUCER ACTIVE	Checks whether a producer is active.	Value is False.
MILLISECONDS LAPSED SINCE PRODUCER WAS ACTIVE	Milliseconds passed since the producer was last active.	Value > max_producer_idle_time, raises an alert if the producer did not produce for max_producer_idle_time ms.
Cluster
ACTIVE CONTROLLER COUNT	Number of brokers in the cluster reporting as the active controller in the last interval.	Value != 1.
ONLINE BROKER COUNT	Number of brokers that are currently online.	Depends on the application. For example, you can raise an alert if the number of brokers falls below the `min.insync.replicas` configured for the producer. Alert-1: Value < min.insync.replicas, raises an alert when producer could not send any messages. Alert-2: Value = min.insync.replicas, raises an alert that denotes if any one of the remaining brokers goes down, then producer would not be able to send messages.
UNCLEAN LEADER ELECTION COUNT	Number of unclean partition leader elections in the cluster reported in the last interval.	Value > 0.
UNDER REPLICATED PARTITIONS COUNT	Total number of topic partitions in the cluster that are under replicated.	Value > 0.
LEADER ELECTION PER SEC	Rate of partition leader elections.	Depends on the number of partitions in the application.
OFFLINE PARTITIONS COUNT	Total number of topic partitions, in the cluster, that are offline.	Value > 0.
NETWORK PROCESSOR AVG IDLE PERCENT	Average fraction of time the network processor threads are idle across the cluster.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the network processor threads are busy. Alert-2: Value > network_processor_idle_percentage, raises an alert when the network_processor_idle_percentage is higher than usual. Note: Value range is 0 to 1.
REQUEST HANDLER POOL AVG IDLE PERCENT	Average fraction of time the request handler threads are idle across the cluster.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the request handler threads are busy. Alert-2: Value > request_handler_idle_percentage, raises an alert when the request_handler_idle_percentage is higher than usual. Note: Value range is 0 to 1.
BROKER BYTES IN DEVIATION PERCENTAGE	Percentage by which a broker bytes in per second has deviated from the average bytes in per second of all the alive brokers.	Value > max_byte_in_deviation_percentage, raises an alert if a broker is seeing more than max_byte_in_deviation_percentage incoming traffic compared to average incoming traffic seen by all the brokers.
BROKER BYTES OUT DEVIATION PERCENTAGE	Percentage by which a broker bytes out per second has deviated from the average bytes out per second of all the alive brokers.	Value > max_byte_out_deviation_percentage, raises an alert if a broker is seeing more than max_byte_out_deviation_percentage outgoing traffic compared to average outgoing traffic seen by all the brokers.
ZOOKEEPER SESSION EXPIRATION PER SEC	Average rate at which brokers are experiencing zookeeper session expiration per second.	If this value is high, it can lead to controller fail over and leader changes. Raises an alert if value > 0.
Consumer
CONSUMER GROUP LAG	How far consumer groups are behind the producers.	Depends on the application.
IS CONSUMER ACTIVE	Checks whether a consumer is active.	Value is False.
MILLISECONDS LAPSED SINCE CONSUMER WAS ACTIVE	Milliseconds passed since the consumer was last active.	Value > max_consumer_idle_time, raises an alert if the consumer did not consume for max_consumer_idle_time ms.
Broker
BYTES IN PER SEC	Number of bytes per second produced to a broker.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the broker becomes idle. Alert-2: Value > max_bytes_in_expected_per_broker, raises an alert when the broker input load is higher than usual.
ZOOKEEPER SESSION EXPIRATION PER SEC	Rate at which brokers are experiencing Zookeeper session expirations per second.	If this value is high, it can lead to controller fail over and leader changes. Raises an alert if value > 0.
TOTAL PRODUCE REQUESTS PER SEC	Total number of produce requests to a broker per second.	Depends on the application. Two kinds of alert can be configured. Alert-1: Value =0, raises an alert when there are no produce requests for the last 15 minutes. Alert-2: Value > usual_num_producer_requests_expected to detect the spike in the number of requests.
PARTITION IMBALANCE PERCENTAGE	The partition imbalance for a broker. It is calculated as: `(abs(average_no_of_partitions_per_broker - actual_no_of_partitions_per_broker) / average_no_of_partitions_per_broker) * 100`	Value > 10 %
BYTES OUT PER SEC	Number of bytes per second fetched from a broker. It does not count the internal replication traffic.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the broker becomes idle. Alert-2: Value > max_bytes_out_expected_per_broker, raises an alert when the broker output load is higher than usual.
IS BROKER DOWN	Checks whether a broker is down.	Value is True.
TOTAL PRODUCE REQUEST LATENCY	Latency of produce requests to this broker at the 99th percentile (in ms).	Value > max_expected_latency_ms.
ISR SHRINKS PER SEC	Rate at which brokers are experiencing InSync Replica Shrinks (number of shrinks per second).	Value > 0.
TOTAL FETCH CONSUMER REQUEST LATENCY	Latency of fetch consumer requests to this broker at 99th percentile (in ms).	Value > max_expected_latency_ms.
REQUEST HANDLER POOL AVG IDLE PERCENT	Average fraction of time the request handler threads are idle.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the request handler threads are busy. Alert-2: Value > request_handler_idle_percentage, raises an alert when the request_handler_idle_percentage is higher than usual. Note: Value range is 0 to 1.
NETWORK PROCESSOR AVG IDLE PERCENT	Average fraction of time the network processor threads are idle.	Two kinds of alert can be configured. Alert-1: Value = 0, raises an alert when the network processor threads are busy. Alert-2: Value > network_processor_idle_percentage, raises an alert when the network_processor_idle_percentage is higher than usual. Note: Value range is 0 to 1.
Cluster Replication
REPLICATION LATENCY	15 minutes average replication latency in milliseconds.	Value > max_expected_replication_latency, raises an alert if the replication latency is greater than max_expected_replication_latency.
REPLICATION THROUGHPUT	15 minutes average replication throughput in bytes per second.	Value < min_expected_throughput, raises an alert if throughput during replication is low. This could happen because of network issues.
CHECKPOINT LATENCY	15 minutes average checkpoint latency in milliseconds.	Value > max_expected_checkpoint_latency, raises an alert if the checkpoint latency is greater than max_expected_replication_latency.
REPLICATION STATUS	Replication status of a replication pipeline.	Value != ACTIVE, raises an alert if the replication is not active.
Latency
END TO END LATENCY	15 minutes average of end to end latency in ms.	Value > max_expected_latency, raises an alert if the end to end latency is greater than max_expected_latency.