Leader positions and in-sync replicas
An overview of how leader positions and in-sync replicas can affect Kafka performance.
Consider the following example which shows a simplified version of a Kafka cluster in steady state. There are N brokers, two topics with nine partitions each. Replicated partitions are not shown for simplicity.
In this example, each broker shown has three partitions per topic and the Kafka cluster has well balanced leader partitions. Recall the following:
Producer writes and consumer reads occur at the partition level.
Leader partitions are responsible for ensuring that the follower partitions keep their records in sync.
Since the leader partitions are evenly distributed, most of the time the load to the overall Kafka cluster is relatively balanced.
Now lets look at an example where a large chunk of the leaders for Topic A and Topic B are on Broker 1.
In a scenario like this a lot more of the overall Kafka workload occurs on Broker 1. Consequently this also causes a backlog of work, which slows down the cluster throughput, which will worsen the backlog. Even if a cluster starts with perfectly balanced topics, failures of brokers can cause these imbalances: if the leader of a partition goes down one of the replicas will become the leader. When the original (preferred) leader comes back, it will get back leadership only if automatic leader rebalancing is enabled; otherwise the node will become a replica and the cluster gets imbalanced.
Let’s take a closer look at Topic A from the previous example that had imbalanced leader partitions. However, this time let's visualize follower partitions as well:
Broker 1 has six leader partitions, broker 2 has two leader partitions, and broker 3 has one leader partition.
Assuming a replication factor of 3.
Assuming all replicas are in-sync, then any leader partition can be moved from Broker 1 to another broker without issue. However, in the case where some of the follower partitions have not caught up, then the ability to change leaders or have a leader election will be hampered.