Tune Parser Core Storm Settings

You can set the number of Kafka spouts to match the number of Kafka partitions. You can also increase the number of workers and ackers to match the Storm nodes, unless the estimated throughput for the parser is very low.

  1. Set the parser Storm settings in the Management user interface.
  2. You can add the following command to the Storm settings to test the parser:
    spout.firstPollOffsetStrategy": "LATEST"
    The command allows the Kafka topic to be written to continuously during testing so when the parser is restarted, the topology will not be flooded with events.
  3. Increase the Parser Parallelism and Num Tasks values in increments based on the number of workers.
    For example, in the previous example, the parameters could be incremented by 3.
  4. As you increase the Parser Parallelism and Num Tasks values, check two Storm statistics: Parser Capacity and the number of tuples acked in a 10-minute window.
    For a given estimated throughput, the capacity should be no greater than ~0.800. This will allow for ~20% overhead should the number of incoming events spike above the estimated average. If the capacity is above this level, Parallelism and Num Tasks should be incremented and the topology restarted.
    The number of acked tuples should be approximately equal to (Desired Throughput ×600) assuming the topology has been active for at least 11 - 12 minutes. If the number of acked tuples and the capacity of the topology are both low, there may not be enough Kafka partitions.
    If the Storm UI is showing a capacity of ~0.800 or less, the Kafka consumer should be monitored to ensure that there is no significant lag or buildup of messages for the parser. The following command shows an example of how this can be monitored via the command line on a Kafka node:
    cd /usr/hdp/current/kafka-broker/bin/
    watch -n 2 ./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 
    master01:2181 --topic asa --group asa_parser