Kafka Setup

Hardware Requirements

Kafka can function on a fairly small amount of resources, especially with some configuration tuning. Out of the box configurations can run on little as 1 core and 1 GB memory with storage scaled based on data retention requirements. These are the defaults for both broker and Mirror Maker in Cloudera Manager version 6.x.

Brokers

Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run on little as 1 core and 1GB memory with storage scaled based on requirements for data retention.

CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important to handle concurrent connections and background tasks.

Kafka brokers tend to have a similar hardware profile to HDFS data nodes. How you build them depends on what is important for your Kafka use cases. Use the following guidelines:
To affect performance of these features: Adjust these parameters:
Message Retention Disk size
Client Throughput (Producer & Consumer) Network capacity
Producer throughput Disk I/O
Consumer throughput Memory

A common choice for a Kafka node is as follows:

Component Memory/Java Heap CPU Disk
Broker
  • RAM: 64 GB
  • Recommended Java heap: 4 GB

Set this value using the Java Heap Size of Broker Kafka configuration property.

See Other Kafka Broker Properties table.

12- 24 cores
  • 1 HDD For operating system
  • 1 HDD for Zookeeper dataLogDir (using SSDs may provide additional performance)
  • 10- HDDs, using Raid 10, for Kafka data
MirrorMaker 1 GB heap

Set this value using the Java Heap Size of MirrorMaker Kafka configuration property.

1 core per 3-4 streams No disk space needed on MirrorMaker instance. Destination brokers should have sufficient disk space to store the topics being copied over.

Networking requirements: Gigabit Ethernet or 10 Gigabit Ethernet. Avoid clusters that span multiple data centers.

ZooKeeper

It is common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. However, for optimal performance Cloudera recommends the usage of dedicated Zookeeper hosts. This is especially true for larger, production environments.

Kafka Performance Considerations

The simplest recommendation for running Kafka with maximum performance is to have dedicated hosts for the Kafka brokers and a dedicated ZooKeeper cluster for the Kafka cluster. If that is not an option, consider these additional guidelines for resource sharing with the Kafka cluster:

Do not run in VMs
It is common practice in modern data centers to run processes in virtual machines. This generally allows for better sharing of resources. Kafka is sufficiently sensitive to I/O throughput that VMs interfere with the regular operation of brokers. For this reason, it is highly recommended to not use VMs for Kafka; if you are running Kafka in a virtual environment you will need to rely on your VM vendor for help optimizing Kafka performance.
Do not run other processes with Brokers or ZooKeeper
Due to I/O contention with other processes, it is generally recommended to avoid running other such processes on the same hosts as Kafka brokers.
Keep the Kafka-ZooKeeper Connection Stable
Kafka relies heavily on having a stable ZooKeeper connection. Putting an unreliable network between Kafka and ZooKeeper will appear as if ZooKeeper is offline to Kafka. Examples of unreliable networks include:
  • Do not put Kafka/ZooKeeper nodes on separated networks
  • Do not put Kafka/ZooKeeper nodes on the same network with other high network loads

Operating System Requirements

SUSE Linux Enterprise Server (SLES)

Unlike CentOS, SLES limits virtual memory by default. Changing this default requires adding the following entries to the /etc/security/limits.conf file:

* hard as unlimited
* soft as unlimited

Kernel Limits

There are three settings you must configure properly for the kernel.

  • File Descriptors

    You can set these in Cloudera Manager via Kafka > Configuration > Maximum Process File Descriptors. We recommend a configuration of 100000 or higher.

  • Max Memory Map

    You must configure this in your specific kernel settings. We recommend a configuration of 32000 or higher.

  • Max Socket Buffer Size

    Set the buffer size larger than any Kafka send buffers that you define.