Configuring data replication offsets
Learn how you can configure and modify what offset the MirrorSourceConnector replicates form.
By default, MirrorSourceConnector replicates data from the start of the source topics, and keeps track of the progress by committing source offsets into the Kafka Connect framework.
This behavior can be modified in the following ways.
- Starting data replication from the latest offset for new partitions.
- Manually setting exact offsets for specific source partitions.
Replicating from the latest offset for new partitions
To replicate data from the latest offset, you configure auto.offset.reset property for the source consumer in the MirrorSourceConnector.
#...
kind: KafkaConnector
spec:
class: org.apache.kafka.connect.mirror.MirrorSourceConnector
config:
source.consumer.auto.offset.reset: latest
With this configuration, all new partitions (without a committed offset) are replicated from the latest offset. Cloudera recommends applying this configuration under special circumstances only as it violates the at-least-once guarantee of data replication.
This example uses thesource.consumer.
prefix. That is,
auto.offset.reset
is specifically set for the source consumer in
the connector, which is the consumer connecting to the source cluster.Manually setting exact offsets for specific source partitions
In some situations, it might be necessary to rewind the replication and reprocess records, or fast forward and skip some records. To do this, you can manipulate the exact offsets per partition and change the state of the replication.
- The connect_shell.sh tool is available to you. See Using connect_shell.sh.
- Ensure that you are familiar with the process of checking replication state. See Checking the state of data replication .