Kafka Setup
Hardware Requirements
Kafka can function on a fairly small amount of resources, especially with some configuration tuning. Out of the box configurations can run on little as 1 core and 1 GB memory with storage scaled based on data retention requirements. These are the defaults for both broker and Mirror Maker in Cloudera Manager version 6.x.
Brokers
Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run on as little as 1 core and 1GB memory with storage scaled based on requirements for data retention.
CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important to handle concurrent connections and background tasks.
To affect performance of these features: | Adjust these parameters: |
---|---|
Message Retention | Disk size |
Client Throughput (Producer & Consumer) | Network capacity |
Producer throughput | Disk I/O |
Consumer throughput | Memory |
A common choice for a Kafka node is as follows:
Component | Memory/Java Heap | CPU | Disk |
---|---|---|---|
Broker |
Set this value using the Java Heap Size of Broker Kafka configuration property. |
12- 24 cores |
|
MirrorMaker | 1 GB heap
Set this value using the Java Heap Size of MirrorMaker Kafka configuration property. |
1 core per 3-4 streams | No disk space needed on MirrorMaker instance. Destination brokers should have sufficient disk space to store the topics being copied over. |
Networking requirements: Gigabit Ethernet or 10 Gigabit Ethernet. Avoid clusters that span multiple data centers.
ZooKeeper
It is common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. However, for optimal performance Cloudera recommends the usage of dedicated Zookeeper hosts. This is especially true for larger, production environments.
Kafka Performance Considerations
The simplest recommendation for running Kafka with maximum performance is to have dedicated hosts for the Kafka brokers and a dedicated ZooKeeper cluster for the Kafka cluster. If that is not an option, consider these additional guidelines for resource sharing with the Kafka cluster:
- Do not run in VMs
- It is common practice in modern data centers to run processes in virtual machines. This generally allows for better sharing of resources. Kafka is sufficiently sensitive to I/O throughput that VMs interfere with the regular operation of brokers. For this reason, it is highly recommended to not use VMs for Kafka; if you are running Kafka in a virtual environment you will need to rely on your VM vendor for help optimizing Kafka performance.
- Do not run other processes with Brokers or ZooKeeper
- Due to I/O contention with other processes, it is generally recommended to avoid running other such processes on the same hosts as Kafka brokers.
- Keep the Kafka-ZooKeeper Connection Stable
- Kafka relies heavily on having a stable ZooKeeper connection. Putting an unreliable network between Kafka and ZooKeeper will appear as if ZooKeeper is offline to Kafka. Examples of
unreliable networks include:
- Do not put Kafka/ZooKeeper nodes on separated networks
- Do not put Kafka/ZooKeeper nodes on the same network with other high network loads
Operating System Requirements
SUSE Linux Enterprise Server (SLES)
Unlike CentOS, SLES limits virtual memory by default. Changing this default requires adding the following entries to the /etc/security/limits.conf file:
* hard as unlimited * soft as unlimited
Kernel Limits
There are three settings you must configure properly for the kernel.
- File Descriptors
You can set these in Cloudera Manager via
. We recommend a configuration of 100000 or higher. - Max Memory Map
You must configure this in your specific kernel settings. We recommend a configuration of 32000 or higher.
- Max Socket Buffer Size
Set the buffer size larger than any Kafka send buffers that you define.