Apache Storm Component Guide
Also available as:
loading table of contents...

Core Storm Concepts

Developing a Storm application requires an understanding of the following basic concepts.

Table 4.1. Storm Concepts

Storm Concept



A named list of values of any data type. A tuple is the native data structure used by Storm.


An unbounded sequence of tuples.


Generates a stream from a realtime data source.


Contains data processing, persistence, and messaging alert logic. Can also emit tuples for downstream bolts.

Stream Grouping

Controls the routing of tuples to bolts for processing.


A group of spouts and bolts wired together into a workflow. A Storm application.

Processing Reliability

Storm guarantee about the delivery of tuples in a topology.


A Storm process. A worker may run one or more executors.


A Storm thread launched by a Storm worker. An executor may run one or more tasks.


A Storm job from a spout or bolt.

ParallelismAttribute of distributed data processing that determines how many jobs are processed simultaneously for a topology. Topology developers adjust parallelism to tune their applications.

Process Controller

Monitors and restarts failed Storm processes. Examples include supervisord, monit, and daemontools.

Master/Nimbus Node

The host in a multi-node Storm cluster that runs a process controller (such as supervisord) and the Storm nimbus, ui, and other related daemons. The process controller is responsible for restarting failed process controller daemons on slave nodes. The Nimbus node is a thrift service that is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

Slave Node

A host in a multi-node Storm cluster that runs a process controller daemon, such as supervisor, as well as the worker processes that run Storm topologies. The process controller daemon is responsible for restarting failed worker processes.

The following subsections describe several of these concepts in more detail.