Writing Storm applications requires an understanding of the following basic concepts.
Table 1.1. Storm Concepts
Storm Concept | Description |
---|---|
Tuple | A named list of values of any data type. The native data structure used by Storm. |
Stream | An unbounded sequence of tuples. |
Spout | Generates a stream from a realtime data source. |
Bolt | Contains data processing, persistence, and messaging alert logic. Can also emit tuples for downstream bolts. |
Stream Grouping | Controls the routing of tuples to bolts for processing. |
Topology | A group of spouts and bolts wired together into a workflow. A Storm application. |
Processing Reliability | Storm guarantee about the delivery of tuples in a topology. |
Parallelism | Attribute of distributed data processing that determines how many jobs are processed simultaneously for a topology. Topology developers adjust parallelism to tune their applications. |
Workers | A Storm process. A worker may run one or more executors. |
Executors | A Storm thread launched by a Storm worker. An executor may run one or more tasks. |
Tasks | A Storm job from a spout or bolt. |
Process Controller | Monitors and restarts failed Storm processes. Examples include supervisord, monit, and daemontools. |
Master/Nimbus Node | The host in a multi-node Storm cluster that runs a process controller, such as supervisord, and the Storm nimbus, ui, and other related daemons. The process controller is responsible for restarting failed process controller daemons, such as supervisor, on slave nodes. The Storm nimbus daemon is responsible for monitoring the Storm cluster and assigning tasks to slave nodes for execution. |
Slave Node | A host in a multi-node Storm cluster that runs a process controller daemon, such as supervisor, as well as the worker processes that run Storm topologies. The process controller daemon is responsible for restarting failed worker processes. |
The following subsections describe several of these concepts in more detail.