Handling large messages
Learn more about the options you have when handling large messages with Kafka.
Kafka can be tuned to handle large messages. This can be done by configuring broker and consumer properties relating to maximum message and file sizes. However, before doing so, Cloudera recommends that you try and reduce the size of messages first. Review the following options that can help you reduce message size.
- The Kafka producer can compress messages. For example, if the original message is a text-based format (such as XML), in most cases the compressed message will be sufficiently small.
- Use the
compression.type
producer configuration parameters to enable compression. gzip, lz4, Snappy, and Zstandard are supported. - If shared storage (such as NAS, HDFS, or S3) is available, consider placing large files on the shared storage and using Kafka to send a message with the file location. In many cases, this can be much faster than using Kafka to send the large file itself.
- Split large messages into 1 KB segments with the producing client, using partition keys to ensure that all segments are sent to the same Kafka partition in the correct order. The consuming client can then reconstruct the original large message.
If none the options presented are viable, and you need to send large messages with Kafka, you can do so by configuring the following parameters based on your requirements.
Property | Default Value | Description |
---|---|---|
message.max.bytes |
1000000 (1 MB) |
Maximum message size the broker accepts. When tweaking this property, make sure to check the value of Producer Batch Size and max.request.size on the producer side. |
log.segment.bytes |
1073741824 (1 GiB) |
Size of a Kafka data file. Must be larger than any single message. |
replica.fetch.max.bytes |
1048576 (1 MiB) |
Maximum message size a broker can replicate. Must be larger than
message.max.bytes , or a broker can accept messages it cannot
replicate, potentially resulting in data loss. |
Property | Default Value | Description |
---|---|---|
max.partition.fetch.bytes |
1048576 (10 MiB) |
The maximum amount of data per-partition the server will return. |
fetch.max.bytes |
52428800 (50 MiB) |
The maximum amount of data the server should return for a fetch request. |