Checking the state of data replication

Learn how to check the current state of data replication.

The MirrorSourceConnector keeps track of its progress in the source cluster using the Kafka Connect framework. Kafka Connect allows checking and manipulating the source offsets of the connectors. You can check the current state of data replication by extracting source offsets and comparing them with the end offsets of replicated partitions.

These steps use the connect_shell.sh and kafka_shell.sh Cloudera Streams Messaging - Kubernetes Operator tools. Ensure that these tools are available to you. Running kafka_shell.sh is only necessary if your source Kafka cluster is deployed with Cloudera Streams Messaging - Kubernetes Operator. See Using kafka_shell.sh and Using connect_shell.sh.

  1. Use connect_shell.sh to exec into a Kafka Connect admin pod of the replicator Kafka Connect cluster.
    ./connect_shell.sh --namespace=[***REPLICATION NAMESPACE***] --cluster=[***CONNECT CLUSTER NAME***]
  2. Use the GET /connectors/CONNECTOR/offsets endpoint of the Kafka Connect REST API to extract source offsets.
    curl -s $CONNECT_REST_URL/connectors/[***CONNECTOR NAME***]/offsets

    [***CONNECTOR NAME***] is the name of the MirrorSourceConnector instance.

  3. In the source cluster, use the kafka-get-offsets.sh Kafka tool to extract the end offsets of the replicated partitions.
    bin/kafka-get-offsets.sh --bootstrap-server [***SOURCE CLUSTER HOST***]:[***PORT***] --topic "test.*"
    • The kafka-get-offsets.sh tool accepts a regex string as the topic filter, but does not accept a list of regexes. To specify multiple regex expressions in a single command (as a single regex string), chain expressions together with pipes (|).
      --topic "test.*|abc.*|zxc.*"
    • If the source Kafka cluster is a Cloudera Streams Messaging - Kubernetes Operator Kafka cluster, use kafka_shell.sh to run the kafka-get-offsets.sh tool
  4. Compare extracted end offsets with the source offsets extracted in Step 2.