Write activity monitoring for source and sink clusters

Learn how to monitor the rate of write operations on source and sink clusters by using Prometheus queries to verify data flow and pipeline drainage.

Understanding Write Activity Metrics

Monitoring the rate of each write operation type independently for the source and sink tables helps you verify replication health.
  • Source cluster: Use these metrics to verify that data is flowing into the system and to detect unexpected operation mixes.
  • Sink cluster: Healthy replication primarily produces upsert operations. The presence of other operation types on the sink cluster is unexpected and requires investigation.

You also use these panels during the pre-stop checklist procedure to confirm that the replication pipeline has fully drained before you stop the job.

The cluster label, added during the Prometheus scrape configuration, separates source metrics from sink metrics without requiring separate metric names.

Prometheus Queries for Write Activity

To create a clean legend in your monitoring tool, use one query per operation type. The following examples show queries for the source cluster. To build the equivalent sink panel, replace cluster="source" with cluster="sink".
  • Source - insert rate (rows per minute):
    sum(rate(kudu_tablet_rows_inserted{cluster="source"}[1m])
      * on(tablet_id) group_left(table_name)
      (max by (tablet_id, table_name, table_id) (kudu_tablet_info{table_name="<table-name>"}))) * 60
    
  • Source - upsert rate (rows per minute):
    sum(rate(kudu_tablet_rows_upserted{cluster="source"}[1m])
      * on(tablet_id) group_left(table_name)
      (max by (tablet_id, table_name, table_id) (kudu_tablet_info{table_name="<table-name>"}))) * 60
    
  • Source - update rate (rows per minute):
    sum(rate(kudu_tablet_rows_updated{cluster="source"}[1m])
      * on(tablet_id) group_left(table_name)
      (max by (tablet_id, table_name, table_id) (kudu_tablet_info{table_name="<table-name>"}))) * 60
    
  • Source - delete rate (rows per minute):
    sum(rate(kudu_tablet_rows_deleted{cluster="source"}[1m])
      * on(tablet_id) group_left(table_name)
      (max by (tablet_id, table_name, table_id) (kudu_tablet_info{table_name="<table-name>"})