4.2. Determining Topology Parallelism Units

Hortonworks recommends using the following calculation to determine the total number of parallelism units for a topology. Parallelism units are a useful conceptual tool for determining how to distribute processing tasks across a distributed application.

(number of worker nodes in cluster * number cores per worker node) - (number of acker tasks)

Acker tasks are topology components that acknowledge a successfully processed tuple. The following example assumes a Storm cluster with ten worker nodes, 16 CPU cores per worker node, and ten acker tasks in the topology. This Storm topology has 150 total parallelism units:

(10 * 16) - 10 = 150

Storm developers can mitigate the increased processing load associated with data persistence operations, such as writing to HDFS and generating reports, by distributing the most parallelism units to topology components that perform data persistence operations.