Mirroring data between Kafka clusters
Also available as:
PDF

Running MirrorMaker

Prerequisite: The source and destination clusters must be deployed and running.

To set up a mirror, run kafka.tools.MirrorMaker. The following table lists configuration options.

At a minimum, MirrorMaker requires one or more consumer configuration files, a producer configuration file, and either a whitelist or a blacklist of topics. In the consumer and producer configuration files, point the consumer to the ZooKeeper process on the source cluster, and point the producer to the ZooKeeper process on the destination (mirror) cluster, respectively.

Table 1. MirrorMaker Options
Parameter Description Examples
--consumer.config Specifies a file that contains configuration settings for the source cluster. For more information about this file, see the "Consumer Configuration File" subsection. --consumer.config hdp1-consumer.properties
--producer.config Specifies the file that contains configuration settings for the target cluster. For more information about this file, see the "Producer Configuration File" subsection. --producer.config hdp1-producer.properties

--whitelist

--blacklist

(Optional) For a partial mirror, you can specify exactly one comma-separated list of topics to include (--whitelist) or exclude (--blacklist).

In general, these options accept Java regex patterns. For caveats, see the note after this table.

--whitelist my-topic

--num.streams Specifies the number of consumer stream threads to create. --num.streams 4
--num.producers Specifies the number of producer instances. Setting this to a value greater than one establishes a producer pool that can increase throughput. --num.producers 2
--queue.size Queue size: number of messages that are buffered, in terms of number of messages between the consumer and producer. Default = 10000. --queue.size 2000
--help List MirrorMaker command-line options.
Note
Note
  • A comma (',') is interpreted as the regex-choice symbol ('|') for convenience.

  • If you specify --white-list=".*", MirrorMaker tries to fetch data from the system-level topic __consumer-offsets and produce that data to the target cluster. This can result in the following error:

    Producer cannot send requests to __consumer-offsets

    Workaround: Specify topic names, or to replicate all topics, specify --blacklist="__consumer-offsets".

The following example replicates topic1 and topic2 from sourceClusterConsumer to targetClusterProducer:

/usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config sourceClusterConsumer.properties  --producer.config targetClusterProducer.properties --whitelist="topic1, topic"

Consumer Configuration File

The consumer configuration file must specify the ZooKeeper process in the source cluster.

Here is a sample consumer configuration file:

zk.connect=hdp1:2181/kafka
zk.connectiontimeout.ms=1000000
consumer.timeout.ms=-1
groupid=dp-MirrorMaker-test-datap1
shallow.iterator.enable=true
mirror.topics.whitelist=app_log

Producer Configuration File

The producer configuration should point to the target cluster's ZooKeeper process (or use the broker.list parameter to specify a list of brokers on the destination cluster).

Here is a sample producer configuration file:

zk.connect=hdp1:2181/kafka-test
producer.type=async
compression.codec=0
serializer.class=kafka.serializer.DefaultEncoder
max.message.size=10000000
queue.time=1000
queue.enqueueTimeout.ms=-1