Kafka High Availability and Consistency
To achieve high availability and consistency targets, adjust the following parameters to meet your requirements:
Replication Factor
The default replication factor for new topics is 1. For highly-available production systems, Cloudera recommends setting the replication factor to at least 3. This requires at least 3 Kafka brokers.
To change the replication factor, navigate to Replication factor to 3, click Save Changes, and restart the Kafka service.
. SetUnclean Leader Election
With unclean leader election disabled, if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online. Enable unclean leader election to allow an out-of-sync replica to become the leader and preserve the availability of the partition. With unclean leader election, messages that were not synced to the new leader are lost. This provides balance between consistency (guaranteed message delivery) and availability.
To enable unclean leader election, navigate to Enable unclean leader election, click Save Changes, and restart the Kafka service.
. Check the box labeledAcknowledgements
When writing or configuring a Kafka producer, you can choose how many replicas commit a new message before the message is acknowledged using the requiredAcks property (see Kafka Sink Properties for details).
Set requiredAcks to 0 (immediately acknowledge the message without waiting for any brokers to commit), 1 (acknowledge after the leader commits the message), or -1 (acknowledge after all in-sync replicas are committed) according to your requirements. Setting requiredAcks to -1 provides the highest consistency guarantee at the expense of slower writes to the cluster.
Minimum In-sync Replicas
You can also set the minimum number of in-sync replicas that must be available for the producer to successfully send messages to a partition using the min.insync.replicas setting. If min.insync.replicas is set to 2 and requiredAcks is set to -1, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash.
It also means that if one of the nodes crashes, the partition is no longer available for writes. Similarly to the unclean leader election configuration, setting min.insync.replicas is a balance between higher consistency (requiring writes to more than one broker) and higher availability (allowing writes when fewer brokers are available).
To configure min.insync.replicas at the cluster level, navigate to Minimum number of replicas in ISR to the desired value, click Save Changes, and restart the Kafka service.
. Setmin.insync.replicas.per.topic=topic_name_1:value,topic_name_2:value
Replace topic_name_n with the topic names, and replace value with the desired minimum number of in-sync replicas.
/usr/bin/kafka-topics --alter --zookeeper zk01.example.com:2181 --topic topicname \ --config min.insync.replicas=2
Kafka MirrorMaker
Kafka mirroring enables maintaining a replica of an existing Kafka cluster. For production use, specify the --no.data.loss parameter. This automatically sets producer parameters to avoid losing data in unexpected events. Data duplication is possible in some scenarios. For example, if MirrorMaker crashes, it duplicates messages since its previous checkpoint.
Checkpoint frequency is controlled with the offset.commit.interval.ms argument. This balances performance and number of duplicates. Committing more frequently is slower, but results in fewer duplicates.