Kafka Administration Basics

Broker Log Management

Kafka brokers save their data as log segments in a directory. The logs are rotated depending on the size and time settings.

The most common log retention settings to adjust for your cluster are shown below. These are accessible in Cloudera Manager via the Kafka > Configuration tab.

  • log.dirs: The location for the Kafka data (that is, topic directories and log segments).
  • log.retention.{ms|minutes|hours}: The retention period for the entire log. Any older log segments are removed.
  • log.retention.bytes: The retention size for the entire log.

There are many more variables available for fine-tuning broker log management. For more detailed information, look at the relevant variables in the Apache Kafka documentation topic Broker Configs.

  • log.dirs
  • log.flush.*
  • log.retention.*
  • log.roll.*
  • log.segment.*

Record Management

There are two pieces to record management, log segments and log cleaner.

As part of the general data storage, Kafka rolls logs periodically based on size or time limits. Once either limit is hit, a new log segment is created with the all new data being placed there, while older log segments should generally no longer change. This helps limit the risk of data loss or corruption to a single segment instead of the entire log.

  • log.roll.{ms|hours}: The time period for each log segment. Once the current segment is older than this value, it goes through log segment rotation.
  • log.segment.bytes: The maximum size for a single log segment.

There is an alternative to simply removing log segments for a partition. There is another feature based on the log cleaner. When the log cleaner is enabled, individual records in older log segments can be managed differently:

  • log.cleaner.enable: This is a global setting in Kafka to enable the log cleaner.
  • cleanup.policy: This is a per-topic property that is usually set at topic creation time. There are two valid values for this property, delete and compact.
  • log.cleaner.min.compaction.lag.ms: This is the retention period for the “head” of the log. Only records outside of this retention period will be compacted by the log cleaner.

The compact policy, also called log compaction, assumes that the "most recent Kafka record is important." Some examples include tracking a current email address or tracking a current mailing address. With log compaction, older records with the same key are removed from a log segment and the latest one is kept. This effectively removes some offsets from the partition.

Broker Garbage Log Collection and Log Rotation

Both broker JVM garbage collection and JVM garbage log rotation is enabled by default in the Kafka version delivered with CDH. Garbage collection logs are written in the agent process directory by default.

Example path:
/run/cloudera-scm-agent/process/99-kafka-KAFKA_BROKER/kafkaServer-gc.log

Changing the default directory of garbage collection logs is currently not supported. However, you can configure properties related garbage log rotation with the Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) property.

  1. In Cloudera Manager, go to the Kafka service and click Configuration.
  2. Find the Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) property.
  3. Add the following line to the property:

    Modify the values of as required.

    KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M"
    The flags used are as follows:
    • +UseGCLogFileRotation: Enables garbage log rotation.
    • -XX:NumberOfGCLogFiles: Specifies the number of files to use when rotating logs.
    • -XX:GCLogFileSize: Specifies the size when the log will be rotated.
  4. Click on Save Changes.
  5. Restart the Kafka service to apply the changes.

Adding Users as Kafka Administrators

In some cases, additional users besides the kafka account need administrator access. This can be done in Cloudera Manager by going to Kafka > Configuration > Super users.