Tuning an Apache Storm Topology

Because Storm topologies operate on streaming data (rather than data at rest, as in HDFS) they are sensitive to data sources. When tuning Storm topologies, consider the following questions:

What are my data sources?
At what rate do these data sources deliver messages?
What size are the messages?
What is my slowest data sink?

Identify the bottleneck.

In the Storm UI, click Show Visualization to display a visual representation of your topology and find the data bottleneck in your Storm application.

Thicker lines between components denote larger data flows.
A blue component represents the the first component in the topology, such as the spout below from the WordCountTopology included with storm-starter.

The color of the other topology components indicates whether the component is exceeding cluster capacity: red components denote a data bottleneck and green components indicate components operating within capacity.

	Note
In addition to bolts defined in your topology, Storm uses its own bolts to perform background work when a topology component acknowledges that it either succeeded or failed to process a tuple. The names of these "acker" bolts are prefixed with an underscore in the visualization, but they do not appear in the default view. To display component-specific data about successful acknowledgements, select the `_ack_ack` checkbox. To display component-specific data about failed acknowledgements, select the `_ack_fail` checkbox.

Note

In addition to bolts defined in your topology, Storm uses its own bolts to perform background work when a topology component acknowledges that it either succeeded or failed to process a tuple. The names of these "acker" bolts are prefixed with an underscore in the visualization, but they do not appear in the default view.

To display component-specific data about successful acknowledgements, select the _ack_ack checkbox. To display component-specific data about failed acknowledgements, select the _ack_fail checkbox.

To verify that you have found the topology bottleneck, rewrite the execute() method of the target bolt or spout so that it performs no operations. If the performance of the topology improves, you have found the bottleneck.
Alternately, turn off each topology component, one at a time, to find the component responsible for the bottleneck.

Refer to "Performance Guidelines for Developing a Storm Topology" for several performance-related development guidelines.
Adjust topology configuration settings.
Increase the parallelism for the target spout or bolt. Parallelism units are a useful conceptual tool for determining how to distribute processing tasks across a distributed application.
Hortonworks recommends using the following calculation to determine the total number of parallelism units for a topology.
```
(number of worker nodes in cluster * number cores per worker node) - (number of acker tasks) 
```
Acker tasks are topology components that acknowledge a successfully processed tuple.

The following example assumes a Storm cluster with ten worker nodes, 16 CPU cores per worker node, and ten acker tasks in the topology. This Storm topology has 150 total parallelism units:
```
(10 * 16) - 10 = 150
```
Storm developers can mitigate the increased processing load associated with data persistence operations, such as writing to HDFS and generating reports, by distributing the most parallelism units to topology components that perform data persistence operations.

In a Storm cluster, most of the computational burden typically falls on the Supervisor and Worker nodes. The Nimbus node usually has a lighter load. For this reason, Hortonworks recommends that organizations save their hardware resources for the relatively burdened Supervisor and Worker nodes.

The performance of a Storm topology degrades when it cannot ingest data fast enough to keep up with the data source. The velocity of incoming streaming data changes over time. When the data flow of the source exceeds what the topology can process, memory buffers fill up. The topology suffers frequent timeouts and must replay tuples to process them.

Use the following techniques to identify and overcome poor topology performance due to mismatched data flow rates between source and application: