Prometheus scrape configuration for Kudu replication

Use the following Prometheus scrape configuration to collect metrics from Flink and Kudu clusters and to enable per-cluster query labeling.

Configuring Prometheus scrape targets

The following configuration adds a cluster label, such as source or sink, to all Kudu tablet server targets. This labeling enables you to write per-cluster queries, such as comparing source write activity versus sink write activity, without using separate metric names or Prometheus jobs.

You must adapt the target addresses and label values to match your specific environment.


scrape_configs:
  # Flink replication job metrics.
  # JobManager exposes coordinator metrics (enumerator state, lastEndTimestamp) on port 9250.
  # TaskManagers expose operator-level metrics (records/sec, checkpoint info) on port 9251+.
  - job_name: "replication"
    static_configs:
      - targets: ["<flink-jobmanager>:9250"]
        labels:
          app: "replication"
          component: "jobmanager"
      - targets: ["<flink-taskmanager>:9251"]
        labels:
          app: "replication"
          component: "taskmanager"
  # Kudu tablet server metrics.
  # metric_relabel_configs extracts the embedded tablet_id into a label
  # and normalizes the metric name so it is stable and queryable.
  - job_name: "kudu"
    metrics_path: "/metrics_prometheus"
    static_configs:
      - targets: ["<src-tserver-1>:8050", "<src-tserver-2>:8050"]
        labels:
          app: "kudu"
          cluster: "source"
      - targets: ["<snk-tserver-1>:8050", "<snk-tserver-2>:8050"]
        labels:
          app: "kudu"
          cluster: "sink"
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'kudu_tablet_([a-f0-9]{32})_(.*)'
        target_label: tablet_id
        replacement: '$1'
      - source_labels: [__name__]
        regex: 'kudu_tablet_([a-f0-9]{32})_(.*)'
        target_label: __name__
        replacement: 'kudu_tablet_$2'
  # Kudu tablet -> table name mapping (by using json_exporter).
  # Scrape all tablet servers (source and sink) so the mapping covers both clusters.
  - job_name: "kudu_tablet_info"
    metrics_path: /probe
    params:
      module: [default]
    static_configs:
      - targets:
          - "http://<src-tserver-1>:8050/metrics?types=tablet"
          - "http://<src-tserver-2>:8050/metrics?types=tablet"
          - "http://<snk-tserver-1>:8050/metrics?types=tablet"
          - "http://<snk-tserver-2>:8050/metrics?types=tablet"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "<json-exporter-host>:7979"