Kafka producers

Learn more about Kafka producers and their most important configuration properties.

Kafka producers are the publishers responsible for writing records to topics. Typically, this means writing a program using the producer API available in your chosen client library. To instantiate a producer:

KafkaProducer<String, String> producer = new
KafkaProducer<>(producerConfig);
var producer = new ProducerBuilder<string, string>(config).Build();

Most of the important producer settings, and mentioned below, are in the configuration passed by this constructor.

Serialization of Keys and Values

For each producer, there are two serialization properties that must be set, key.serializer (for the key) and value.serializer (for the value). You can write custom code for serialization or use one of the ones already provided by Kafka. Some of the more commonly used ones are:
  • ByteArraySerializer: Binary data
  • StringSerializer: String representations
Serialization is passed to the producer builder object for key and value serialization:
var producer = new ProducerBuilder<string, string>(producerConfig)
                    .SetKeySerializer(Serializers.Utf8)
                    .SetValueSerializer(Serializers.ByteArray)
                    .Build())

Managing Record Throughput

There are several settings to control how many records a producer accumulates before actually sending the data to the cluster. This tuning is highly dependent on the data source. Some possibilities include:

  • batch.size: Combine this fixed number of records before sending data to the cluster.
  • linger.ms: Always wait at least this amount of time before sending data to the cluster; then send however many records has accumulated in that time.
  • max.request.size: Put an absolute limit on data size sent. This technique prevents network congestion caused by a single transfer request containing a large amount of data relative to the network speed.
  • compression.type: Enable compression of data being sent.
  • retries: Enable the client for retries based on transient network errors. Used for reliability.
  • BatchNumMessages: Combine this fixed number of records before sending data to the cluster.
  • LingerMS: Always wait at least this amount of time before sending data to the cluster; then send however many records has accumulated in that time.
  • CompressionType: Enable compression of data being sent.
  • MessageSendMaxRetries: Maximum number of retries for sending a failed message.

Acknowledgments

The full write path for records from a producer is to the leader partition and then to all of the follower replicas. The producer can control which point in the path triggers an acknowledgment. Depending on the acks setting, the producer may wait for the write to propagate all the way through the system or only wait for the earliest success point.

Valid acks values are:

  • 0: Do not wait for any acknowledgment from the partition (fastest throughput).
  • 1: Wait only for the leader partition response.
  • all: Wait for follower partitions responses to meet minimum (slowest throughput).

Partitioning

In Kafka, the partitioner determines how records map to partitions. Use the mapping to ensure the order of records within a partition and manage the balance of messages across partitions. The default partitioner uses the entire key to determine which partition a message corresponds to. Records with the same key are always mapped to the same partition (assuming the number of partitions does not change for a topic). Consider writing a custom partitioner if you have information about how your records are distributed that can produce more efficient load balancing across partitions. A custom partitioner lets you take advantage of the other data in the record to control partitioning.

If a partitioner is not provided to the KafkaProducer, Kafka uses a default partitioner. The ProducerRecord class is the actual object processed by the KafkaProducer. It takes the following parameters:
  • Kafka Record: The key and value to be stored.
  • Intended Destination: The destination topic and the specific partition (optional).
You can set what partitioner to use with the Partitioner property of ProducerConfig. By default the consistent_random partitioner is used. In C# you define the key and value types in the ProducerBuilder. When a new message is sent to a topic, a Message<Key,Value> object is processed with the key and value types specified in ProducerBuilder.