Kafka Performance: System-Level Broker Tuning

Operating system related kernel parameters affect overall performance of Kafka. These parameters can be configured via sysctl at runtime. To make kernel configuration changes persistent (that is, use adjusted parameters after a reboot), edit /etc/sysctl.conf. The following sections describe some important kernel settings.

File Descriptor Limits

As Kafka works with many log segment files and network connections, the Maximum Process File Descriptors setting may need to be increased in some cases in production deployments, if a broker hosts many partitions. For example, a Kafka broker needs at least the following number of file descriptors to just track log segment files:

(number of partitions)*(partition size / segment size)

The broker needs additional file descriptors to communicate via network sockets with external parties (such as clients, other brokers, Zookeeper, Sentry, and Kerberos).

The Maximum Process File Descriptors setting can be monitored in Cloudera Manager and increased if usage requires a larger value than the default ulimit (often 64K). It should be reviewed for use case suitability.

  • To review FD limit currently set for a running Kafka broker, run cat /proc/KAFKA_BROKER_PID/limits, and look for Max open files.
  • To see open file descriptors, run:
    lsof -p KAFKA_BROKER_PID

Filesystems

Linux records when a file was created (ctime), modified (mtime) and accessed (atime). The value noatime is a special mount option for filesystems (such as EXT4) in Linux that tells the kernel not to update inode information every time a file is accessed (that is, when it was last read). Using this option may result in write performance gain. Kafka is not relying on atime. The value relatime is another mounting option that optimizes how atime is persisted. Access time is only updated if the previous atime was earlier than the current modified time.

To view mounting options, run mount -l or cat /etc/fstab command.

Virtual Memory Handling

Kafka uses system page cache extensively for producing and consuming the messages. The Linux kernel parameter, vm.swappiness, is a value from 0-100 that controls the swapping of application data (as anonymous pages) from physical memory to virtual memory on disk. The higher the value, the more aggressively inactive processes are swapped out from physical memory. The lower the value, the less they are swapped, forcing filesystem buffers to be emptied. It is an important kernel parameter for Kafka because the more memory allocated to the swap space, the less memory can be allocated to the page cache. Cloudera recommends to set vm.swappiness value to 1.

  • To check memory swapped to disk, run vmstat and look for the swap columns.

Kafka heavily relies on disk I/O performance. vm.dirty_ratio and vm.dirty_background_ratio are kernel parameters that control how often dirty pages are flushed to disk. Higher vm.dirty_ratio results in less frequent flushes to disk.

  • To display the actual number of dirty pages in the system, run egrep "dirty|writeback" /proc/vmstat

Networking Parameters

Kafka is designed to handle a huge amount of network traffic. By default, the Linux kernel is not tuned for this scenario. The following kernel settings may need to be tuned based on use case or specific Kafka workload:

  • net.core.wmem_default: Default send socket buffer size.
  • net.core.rmem_default: Default receive socket buffer size.
  • net.core.wmem_max: Maximum send socket buffer size.
  • net.core.rmem_max: Maximum receive socket buffer size.
  • net.ipv4.tcp_wmem: Memory reserved for TCP send buffers.
  • net.ipv4.tcp_rmem: Memory reserved for TCP receive buffers.
  • net.ipv4.tcp_window_scaling: TCP Window Scaling option.
  • net.ipv4.tcp_max_syn_backlog: Maximum number of outstanding TCP SYN requests (connection requests).
  • net.core.netdev_max_backlog: Maximum number of queued packets on the kernel input side (useful to deal with spike of network requests).

To specify the parameters, you can use Cloudera Enterprise Reference Architecture as a guideline.

Configuring JMX Ephemeral Ports

Kafka uses two high-numbered ephemeral ports for JMX. These ports are listed when you view netstat -anp information for the Kafka broker process.

  • You can change the number for the first port by adding a command similar to the following to the field Additional Broker Java Options (broker_java_opts) in Cloudera Manager.
    -Dcom.sun.management.jmxremote.rmi.port=port
  • The JMX_PORT configuration maps to com.sun.management.jmxremote.port by default.

    To access JMX via JConsole, run jconsole ${BROKER_HOST}:9393

  • The second ephemeral port used for JMX communication is implemented for the JRMP protocol and cannot be changed.