Monitoring replication job
Learn how to visualize the health of the replication pipeline by configuring Prometheus and Grafana.
To visualize the health of the replication pipeline, you must configure Prometheus and Grafana.
Configuring Prometheus
- Enable Prometheus metrics reporting in Flink by adding the following to the
conf/config.yaml
file:
metrics: reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory reporter.prom.port: 9250-9260 - Normalize Kudu tablet server metrics by using Prometheus metric relabeling and the
json_exportertool. This ensures metrics are stable and queryable by table name. - Use the provided scrape configurations to add a cluster label (such as source or sink) to all Kudu targets.
Example Prometheus queries
Import the provided Grafana dashboard to monitor the following areas:
- Job health: Uptime, restart counts, and checkpoint durations.
- Replication lag: The difference between the current time and the
lastEndTimestamp. - Write activity: The rate of each write operation type on the source and sink clusters.
- Throughput: The number of records per second emitted by the source and consumed by the sink.
