4. Topology Development Guidelines

Hortonworks recommends the following guidelines for all Storm topologies.

[Note]Note

These recommendations focus on guidelines for writing and debugging Storm topologies, rather than hardware tuning. Typically, most of the computation burden falls on the Supervisor and Worker nodes in a Storm cluster. The Nimbus node usually has a lighter load. For this reason, Hortonworks recommends that organizations save their hardware resources for the relatively burdened Supervisor and Worker nodes.

 

Table 1.5. Storm Topology Guidelines

Guideline

Description

Read topology configuration parameters from a file.

Rather than hard coding configuration information in your Storm application, read the configuration parameters, including parallelism hints for specific components, from a file inside the main() method of the topology. This speeds up the iterative process of debugging by eliminating the need to rewrite and recompile code for simple configuration changes.

Use a cache.

Use a cache to improve performance by eliminating unnecessary operations over the network, such as making frequent external service or lookup calls for reference data needed for processing.

Tighten code in the execute() method.

Every tuple is processed by the execute() method, so verify that the code in this method is as tight and efficient as possible.

Perform benchmark testing to determine latencies.

Perform benchmark testing of the critical points in the network flow of your topology. Knowing the capacity of your data "pipes" provides a reliable standard for judging the performance of your topology and its individual components.