Tooling
Kafka
Consumer group offset lag viewer
There is a GUI tool to make creating, modifying, and generally managing your Kafka topics a bit easier - see kafka-manager
Console consumer - useful for quickly verifying topic contents
Storm
For more information on the Storm user interface, see Reading and Understanding the Storm UI.
Example: Viewing Kafka Offset Lags
First we need to set up some environment variables.
``` export BROKERLIST your broker comma-delimated list of host:ports> export ZOOKEEPER your zookeeper comma-delimated list of host:ports> export KAFKA_HOME kafka home dir> export METRON_HOME your metron home> export HDP_HOME your HDP home> ```
If you have Kerberos enabled, set up the security protocol
``` $ cat /tmp/consumergroup.config security.protocol=SASL_PLAINTEXT ```
Now run the following command for a running topology's consumer group. In this example we are using enrichments.
``` ${KAFKA_HOME}/bin/kafka-consumer-groups.sh \ --command-config=/tmp/consumergroup.config \ --describe \ --group enrichments \ --bootstrap-server $BROKERLIST \ --new-consumer ```
This will return a table with the following output depicting offsets for all partitions and consumers associated with the specified consumer group.
``` GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER enrichments enrichments 9 29746066 29746067 1 consumer-2_/xxx.xxx.xxx.xxx enrichments enrichments 3 29754325 29754326 1 consumer-1_/xxx.xxx.xxx.xxx enrichments enrichments 43 29754331 29754332 1 consumer-6_/xxx.xxx.xxx.xxx ... ```
Note | |
---|---|
You won't see any output until a topology is actually running because the consumer groups only exist while consumers in the spouts are up and running. |
The primary column we're concerned with paying attention to is the LAG column, which is the current delta calculation between the current and end offset for the partition. This tells us how close we are to keeping up with incoming data. And, as we found through multiple trials, whether there are any problems with specific consumers getting stuck.
Taking this one step further, it's probably more useful if we can watch the offsets and lags change over time. In order to do this we'll add a "watch" command and set the refresh rate to 10 seconds.
``` watch -n 10 -d ${KAFKA_HOME}/bin/kafka-consumer-groups.sh \ --command-config=/tmp/consumergroup.config \ --describe \ --group enrichments \ --bootstrap-server $BROKERLIST \ --new-consumer ```
Every 10 seconds the command will re-run and the screen will be refreshed with new information. The most useful bit is that the watch command will highlight the differences from the current output and the last output screens.