Enumerator split state monitoring for Kudu replication
Learn how to monitor the state of scan splits during a discovery cycle by using Prometheus queries to identify stalled readers.
Understanding split state metrics
The primary goal of monitoring the enumerator split state is to track the progress of scan splits through a discovery cycle. In a steady state between cycles, the values for all three split metrics must be zero.
A non-zero value for the unassignedCount or pendingCount metric that does not drain between cycles indicates a stalled reader that requires troubleshooting.
During an active discovery cycle, the following behaviors are expected:
- The
unassignedCountmetric spikes to the number of tablets and drains to zero as the job assigns splits to readers. - The
pendingRemovalCountmetric rises briefly and drops to zero after the next Flink checkpoint completes.
Prometheus query for split states
To visualize these states in a single Grafana panel, use the label_replace function to unify the metrics. You must set the Grafana legend to {{split_state}}.
Run the following query to monitor the split states:
label_replace(flink_jobmanager_job_operator_coordinator_enumerator_pendingCount,
"split_state", "pending", "", "")
or
label_replace(flink_jobmanager_job_operator_coordinator_enumerator_pendingRemovalCount,
"split_state", "pendingRemoval", "", "")
or
label_replace(flink_jobmanager_job_operator_coordinator_enumerator_unassignedCount,
"split_state", "unassigned", "", "")
