Kafka rack awareness
Learn about Kafka rack awareness and multi-level rack awareness.
Racks provide information about the physical location of a broker or a client. A Kafka deployment can be made rack aware by configuring rack awareness for the Kafka brokers and clients respectively. Enabling rack awareness can help in hardening your deployment, it provides durability guarantees for your Kafka service, and significantly decreases the chances of data loss
Rack awareness for Kafka brokers
Learn about Kafka broker rack awareness and how rack-aware Kafka brokers behave.
To avoid a single point of failure, instead of putting all brokers into the same rack, it is considered a best practice to spread your Kafka brokers among racks. In cloud environments Kafka brokers located in different availability zones or data centers are usually deployed in different racks. Kafka brokers have built in support for this type of cluster topology and can be configured to be aware of the racks they are in.
If you create, modify, or redistribute a topic in a rack-aware Kafka deployment, rack awareness ensures that replicas of the same partition are spread across as many racks as possible. This limits the risk of data loss if a complete rack fails. Replica assignment will try to assign an equal number of leaders for each broker, therefore, it is advised to configure an equal number of brokers for each rack to avoid uneven load of racks.
For example, assume you have a topic partition with 3 replicas and have the brokers configured in 3 different racks. If rack awareness is enabled, Kafka will try to distribute the replicas among the racks evenly in a round-robin fashion. In the case of this example, this means that Kafka will ensure to spread all replicas among the 3 different racks, significantly decreasing the chances of data loss in case of a rack failure.
Multi-level rack awareness
Standard rack awareness handles all racks as unique physical locations (for example, Data Centers) that have identical importance. In some use cases, physical locations follow a hierarchical system. For example, besides having multiple DCs, there can be (physical) racks located inside those DCs. In a use case like this, the aim is not only to distribute the replicas among the topmost racks (DCs), but among the second level racks as well (physical racks).
Cruise Control optimizations with multi-level rack awareness
If Cruise Control is present in the cluster, and the Kafka brokers run with multi-level rack awareness enabled, Cruise Control will replace all the standard rack aware goals in its configuration with a multi-level rack-aware goal. This ensures that Cruise Control optimizations do not violate the multi-level rack awareness guarantees. For more information on how the goal works, see Setting capacity estimations and goals in the Cruise Control documentation.
Rack awareness for Kafka consumers
Learn about follower fetching, which can be used to make Kafka consumers rack aware
When a Kafka consumer tries to consume a topic partition, it fetches from the partition leader by default. If the partition leader and the consumer are not in the same rack, fetching generates significant cross-rack traffic, which has a number of disadvantages. For example, it can generate high costs and lead to lower consumer bandwidth and throughput.
For this reason, it is possible to provide the client with rack information so that the client fetches from the closest replica instead of the leader. If the configured closest replica does not exist (there is no replica for the needed partition in the configured closest rack), it uses the partition leader. This feature is called follower fetching and it can be used to mitigate the costs generated by cross-rack traffic or increase consumer throughput.
Follower fetching in multi-level deployments
Follower fetching is also supported if multi-level rack awareness is enabled for the brokers. When Kafka brokers are running in multi-level rack-aware mode, a multi-level rack-aware ReplicaSelector is automatically installed. This selector ensures that if a consumer that has a multi-level rack ID, the closest replica is selected from the multi-level hierarchy.
Rack awareness for Kafka producers
Learn about rack awareness for Kafka producers.
Compared to brokers or consumers, there are no producer specific rack-awareness features or toggles that you can enable. However, in a deployment where rack awareness is an important factor, you can make configuration changes so that producers make use of rack awareness and have messages replicated to multiple racks.
Specifically, Cloudera recommends a configuration that ensures that the produced messages are
replicated to at least two different racks before the messages are considered to be
successful. This involves configuring acks
to all
in the
producer configuration and setting up min.insync.replicas
for the topics in a
way that ensures a minimum of two racks get the message before the produce request is
considered successful.
The configuration of the acks
property is fixed. If you want to make your
producers rack aware, the property must be set to all
no matter the cluster
topology or deployment.
The exact value you set for min.insync.replicas
on the other hand depends on
your cluster deployment. Specifically, the min.insync.replicas
value you must
set will depend on the number of racks, brokers, and the replication factor of your topics.
Cloudera recommends that you exercise caution and review the following examples to better
understand configuration.
For example, consider a Cloudera recommended deployment that has three racks with topic
replication set to 3. In a case like this, a min.insync.replicas
setting of 2
ensures that you always have data written to at least two different racks even if one replica
is lagging.
Understand however, that setting min.insync.replicas
to 2 does not
universally work for all deployments and may not guarantee that you always have your produced
message in at least two racks. Configuration depends on the number of replicas, as well as the
number of racks and brokers.
If you have more replicas and brokers than racks, you will have at least two replicas in the
same rack. In a case like this, setting min.insync.replicas
to 2 is not
sufficient, a partition might become unavailable under certain circumstances.
For example, assume you have three racks with topic replication factor set to 4, meaning that there are a total of four replicas. Additionally, assume that only two of the replicas are in the in-sync replica set (ISR), the leader and one of the followers, and both are located in the same rack. The other two replicas are lagging. Unclean leader election is disabled to avoid data loss.
When the leader and the in-sync follower (located in the same rack) successfully append a
produced message to the log, message production is considered successful. The leader does not
wait for acknowledgement from the lagging replicas. This is because acks=all
only guarantees that the leader waits for the replicas that are in the ISR (including itself).
This means that while the latest messages are available on two brokers, both are located on
the same rack. If the rack goes down at the same time or shortly after production is
successful, the partition will become unavailable as only the two lagging replicas remain,
which cannot become leaders.
In cases like this, a correct value for min.insync.replicas
would be 3
instead of 2 as three ISRs would guarantee that messages are produced to at least two
different racks.