Kafka Producer Settings
If performance is important and you have not yet upgraded to the new Kafka producer (client version 0.9.0.1 or later), consider doing so. The new producer is generally faster and more fully featured than the previous client.
To use the new producer client, add the associated maven dependency on the client jar; for example:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.9.0.0</version> </dependency>
For more information, see the KafkaProducer javadoc.
The following subsections describe several types of configuration settings that influence the performance of Kafka producers.
Important Producer Settings
The lifecycle of a request from producer to broker involves several configuration settings:
The producer polls for a batch of messages from the batch queue, one batch per partition. A batch is ready when one of the following is true:
batch.size
is reached. Note: Larger batches typically have better compression ratios and higher throughput, but they have higher latency.linger.ms
(time-based batching threshold) is reached. Note: There is no simple guideilne for settinglinger.ms
values; you should test settings on specific use cases. For small events (100 bytes or less), this setting does not appear to have much impact.Another batch to the same broker is ready.
The producer calls
flush()
orclose()
.
The producer groups the batch based on the leader broker.
The producer sends the grouped batch to the broker.
The following paragraphs list additional settings related to the request lifecycle:
max.in.flight.requests.per.connection
(pipelining)The maximum number of unacknowledged requests the client will send on a single connection before blocking. If this setting is greater than 1, pipelining is used when the producer sends the grouped batch to the broker. This improves throughput, but if there are failed sends there is a risk of out-of-order delivery due to retries (if retries are enabled). Note also that excessive pipelining reduces throughput.
compression.type
Compression is an important part of a producer’s work, and the speed of different compression types differs a lot.
To specify compression type, use the
compression.type
property. It accepts standard compression codecs ('gzip', 'snappy', 'lz4'), as well as 'uncompressed' (the default, equivalent to no compression), and 'producer' (uses the compression codec set by the producer).Compression is handled by the user thread. If compression is slow it can help to add more threads. In addition, batching efficiency impacts the compression ratio: more batching leads to more efficient compression.
acks
The
acks
setting specifies acknowledgments that the producer requires the leader to receive before considering a request complete. This setting defines the durability level for the producer.Acks Throughput Latency Durability 0 High Low No Guarantee. The producer does not wait for acknowledgment from the server. 1 Medium Medium Leader writes the record to its local log, and responds without awaiting full acknowledgment from all followers. -1 Low High Leader waits for the full set of in-sync replicas (ISRs) to acknowledge the record. This guarantees that the record is not lost as long as at least one IRS is active. flush()
The new Producer API supports an optional
flush()
call, which makes all buffered records immediately available to send (even iflinger.ms
is greater than 0).When using
flush()
, the number of bytes between twoflush()
calls is an important factor for performance.In microbenchmarking tests, a setting of approximately 4MB performed well for events 1KB in size.
A general guideline is to set
batch.size
equal to the total bytes betweenflush()
calls divided by number of partitions:(total bytes between
flush()
calls) / (partition count)
Additional Considerations
A producer thread going to the same partition is faster than a producer thread that sends messages to multiple partitions.
If a producer reaches maximum throughput but there is spare CPU and network capacity on the server, additional producer processes can increase overall throughput.
Performance is sensitive to event size: larger events are more likely to have better throughput. In microbenchmarking tests, 1KB events streamed faster than 100-byte events.