Developing Apache Kafka ApplicationsPDF version

Kafka producers

Learn more about Kafka producers and their most important configuration properties.

Kafka producers are the publishers responsible for writing records to topics. Typically, this means writing a program using the KafkaProducer API. To instantiate a producer:

KafkaProducer<String, String> producer = new
KafkaProducer<>(producerConfig);

Most of the important producer settings, and mentioned below, are in the configuration passed by this constructor.

For each producer, there are two serialization properties that must be set, key.serializer (for the key) and value.serializer (for the value). You can write custom code for serialization or use one of the ones already provided by Kafka. Some of the more commonly used ones are:

  • ByteArraySerializer: Binary data
  • StringSerializer: String representations

There are several settings to control how many records a producer accumulates before actually sending the data to the cluster. This tuning is highly dependent on the data source. Some possibilities include:

  • batch.size: Combine this fixed number of records before sending data to the cluster.
  • linger.ms: Always wait at least this amount of time before sending data to the cluster; then send however many records has accumulated in that time.
  • max.request.size: Put an absolute limit on data size sent. This technique prevents network congestion caused by a single transfer request containing a large amount of data relative to the network speed.
  • compression.type: Enable compression of data being sent.
  • retries: Enable the client for retries based on transient network errors. Used for reliability.

The full write path for records from a producer is to the leader partition and then to all of the follower replicas. The producer can control which point in the path triggers an acknowledgment. Depending on the acks setting, the producer may wait for the write to propagate all the way through the system or only wait for the earliest success point.

Valid acks values are:

  • 0: Do not wait for any acknowledgment from the partition (fastest throughput).
  • 1: Wait only for the leader partition response.
  • all: Wait for follower partitions responses to meet minimum (slowest throughput).

In Kafka, the partitioner determines how records map to partitions. Use the mapping to ensure the order of records within a partition and manage the balance of messages across partitions. The default partitioner uses the entire key to determine which partition a message corresponds to. Records with the same key are always mapped to the same partition (assuming the number of partitions does not change for a topic). Consider writing a custom partitioner if you have information about how your records are distributed that can produce more efficient load balancing across partitions. A custom partitioner lets you take advantage of the other data in the record to control partitioning.

If a partitioner is not provided to the KafkaProducer, Kafka uses a default partitioner.

The ProducerRecord class is the actual object processed by the KafkaProducer. It takes the following parameters:

  • Kafka Record: The key and value to be stored.
  • Intended Destination: The destination topic and the specific partition (optional).