Kafka

Kafka’s default configuration with Cloudera Manager is suited to start development quickly. Several default settings should be changed before deploying a Cloudera Kafka cluster in production.

The default ZooKeeper Kafka root is /. However, Cloudera recommends changing this to /kafka. This is the location within ZooKeeper where the znodes for the Kafka cluster are stored. As long as a single Kafka cluster is using the ZooKeeper service, using /kafka is recommended. If multiple Kafka clusters, for example, the development, test, and QA teams are sharing a ZooKeeper service, then each Kafka instance should have a unique ZooKeeper root. For example, /kafka-dev, /kafka-test, /kafka-qa.

Cloudera Manager enables automatic creation of Kafka topics by default. If some data is written to a Kafka topic, then this operation creates that topic, if that topic does not exist already. While this may be convenient in prototyping and development, auto-creation of Kafka topics should not be used in production environments. Auto-creation can lead to creation of arbitrary topics and data can be written to the wrong topic, in case the application is configured incorrectly.

Cloudera Manager sets the default minimum number of in-sync replicas (ISR) to 1. This should generally be increased to a minimum of 2 in a production cluster to prevent data loss.

The Kafka Maximum Process File Descriptors setting may need to be increased in certain production deployments. This value can be monitored in Cloudera Manager and increased if usage requires a larger value than the default 64 k limit.

The default data retention time for Kafka is acceptable for production deployments, but it should be reviewed based on the use case.