Kafka
Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run on as little as 1 core and 1GB memory with storage scaled based on requirements for data retention.
CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important to handle concurrent connections and background tasks.
Kafka brokers tend to have a similar hardware profile to HDFS data nodes. How you build them depends on what is important for your Kafka use cases.
To affect performance of these features: | Adjust these parameters: |
---|---|
Message Retention | Disk size |
Client Throughput (Producer & Consumer) | Network capacity |
Producer throughput | Disk I/O |
Consumer throughput | Memory |
A common choice for a Kafka node is as follows:
Component | Memory/Java Heap | CPU | Disk |
---|---|---|---|
Broker |
Set this value using the Java Heap Size of Broker Kafka configuration property. |
12- 24 cores |
|
Cruise Control |
1 GB |
1 core | Because Cruise Control stores its data in Kafka the storage requirements will depend on the retention settings of the related Kafka topics. |
Kafka Connect | 0.5 - 4 GB heap size depending on the Connectors in use. | 4 cores | |
MirrorMaker | 1 GB heap Set this value using the Java Heap Size of MirrorMaker Kafka configuration property. |
1 core per 3-4 streams | No disk space needed on MirrorMaker instance. Destination brokers should have sufficient disk space to store the topics being copied over. |
Schema Registry | 1 GB heap | 2 cores |
1 MB Serialization JAR files may be uploaded and may be of any size. The disk usage depends on the JAR files uploaded. The files may be stored locally on the same host where SchemaRegistry is running or in HDFS if available. |
Streams Messaging Manager | 8 GB heap | 8 cores | 5 GB |
Streams Replication Manager |
|
The performance of the SRM driver is mostly impacted by network throughput and latency. | No resources required |
Networking requirements: Gigabit Ethernet or 10 Gigabit Ethernet. Avoid clusters that span multiple data centers.
Kafka and Zookeeper: It is common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. However, for optimal performance Cloudera recommends the usage of dedicated Zookeeper hosts. This is especially true for larger, production environments.