Checking the state of data replication

The MirrorSourceConnector keeps track of its progress in the source cluster using the Kafka Connect framework. Kafka Connect allows checking and manipulating the source offsets of the connectors. You can check the current state of data replication by extracting source offsets and comparing them with the end offsets of replicated partitions.

These steps use the kafka-get-offsets.sh Kafka tool to extract the end offsets of replicated partitions in the source cluster. If your source cluster is deployed with Cloudera Streams Messaging - Kubernetes Operator, ensure that the kafka_shell.sh tool is available to you. The kafka_shell.sh tool sets up a pod where Kafka tools are readily available making it easy to run kafka-get-offsets.sh. For more information, see Using kafka_shell.sh.

  1. List the current offsets of the MirrorSourceConnector.
    1. Configure your KafkaConnector resource to include the spec.listOffsets property.
      #...
      kind: KafkaConnector
      spec:
        class: org.apache.kafka.connect.mirror.MirrorSourceConnector
        listOffsets:
          toConfigMap:
            name: [***CONFIGMAP NAME***]
      
      If the ConfigMap you specify does not exist, the Strimzi Cluster Operator creates it when you list connector offsets using the strimzi.io/connector-offsets="list" annotation.
    2. List connector offsets by annotating your KafkaConnector resource with strimzi.io/connector-offsets="list".
      kubectl annotate kafkaconnector [***CONNECTOR NAME***] \
        --namespace [***NAMESPACE***] \
        strimzi.io/connector-offsets="list"
      Once the annotation is applied, the connector offsets are written to the ConfigMap specified in the spec.listOffsets property of the KafkaConnector resource. You will add your changes to this ConfigMap. The Strimzi Cluster Operator automatically removes the annotation once offsets are written.
  2. In the source cluster, use the kafka-get-offsets.sh Kafka tool to extract the end offsets of the replicated partitions.
    bin/kafka-get-offsets.sh --bootstrap-server [***SOURCE CLUSTER HOST***]:[***PORT***] --topic "test.*"
    • The kafka-get-offsets.sh tool accepts a regex string as the topic filter, but does not accept a list of regexes. To specify multiple regex expressions in a single command (as a single regex string), chain expressions together with pipes (|).
      --topic "test.*|abc.*|zxc.*"
    • If the source Kafka cluster is a Cloudera Streams Messaging - Kubernetes Operator Kafka cluster, use kafka_shell.sh to run the kafka-get-offsets.sh tool.
  3. Compare extracted end offsets with the source offsets extracted in Step 1.