Core Storm Concepts
Developing a Storm application requires an understanding of the following basic concepts.
Storm Concept |
Description |
---|---|
Tuple |
A named list of values of any data type. A tuple is the native data structure used by Storm. |
Stream |
An unbounded sequence of tuples. |
Spout |
Generates a stream from a realtime data source. |
Bolt |
Contains data processing, persistence, and messaging alert logic. Can also emit tuples for downstream bolts. |
Stream Grouping |
Controls the routing of tuples to bolts for processing. |
Topology |
A group of spouts and bolts wired together into a workflow. A Storm application. |
Processing Reliability |
Storm guarantee about the delivery of tuples in a topology. |
Workers |
A Storm process. A worker may run one or more executors. |
Executors |
A Storm thread launched by a Storm worker. An executor may run one or more tasks. |
Tasks |
A Storm job from a spout or bolt. |
Parallelism | Attribute of distributed data processing that determines how many jobs are processed simultaneously for a topology. Topology developers adjust parallelism to tune their applications. |
Process Controller |
Monitors and restarts failed Storm processes. Examples include supervisord, monit, and daemontools. |
Master/Nimbus Node |
The host in a multi-node Storm cluster that runs a process controller (such as supervisord) and the Storm nimbus, ui, and other related daemons. The process controller is responsible for restarting failed process controller daemons on slave nodes. The Nimbus node is a thrift service that is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. |
Slave Node |
A host in a multi-node Storm cluster that runs a process controller daemon, such as supervisor, as well as the worker processes that run Storm topologies. The process controller daemon is responsible for restarting failed worker processes. |
The following subsections describe several of these concepts in more detail.