1. Basic Storm Concepts

Writing Storm applications requires an understanding of the following basic concepts.

 

Table 1.1. Storm Concepts

Storm Concept

Description

Tuple

A named list of values of any data type. The native data structure used by Storm.

Stream

An unbounded sequence of tuples.

Spout

Generates a stream from a realtime data source.

Bolt

Contains data processing, persistence, and messaging alert logic. Can also emit tuples for downstream bolts.

Stream Grouping

Controls the routing of tuples to bolts for processing.

Topology

A group of spouts and bolts wired together into a workflow. A Storm application.

Processing Reliability

Storm guarantee about the delivery of tuples in a topology.

Parallelism

Attribute of distributed data processing that determines how many jobs are processed simultaneously for a topology. Topology developers adjust parallelism to tune their applications.

Workers

A Storm process. A worker may run one or more executors.

Executors

A Storm thread launched by a Storm worker. An executor may run one or more tasks.

Tasks

A Storm job from a spout or bolt.

Process Controller

Monitors and restarts failed Storm processes. Examples include supervisord, monit, and daemontools.

Master/Nimbus Node

The host in a multi-node Storm cluster that runs a process controller, such as supervisord, and the Storm nimbus, ui, and other related daemons. The process controller is responsible for restarting failed process controller daemons, such as supervisor, on slave nodes. The Storm nimbus daemon is responsible for monitoring the Storm cluster and assigning tasks to slave nodes for execution.

Slave Node

A host in a multi-node Storm cluster that runs a process controller daemon, such as supervisor, as well as the worker processes that run Storm topologies. The process controller daemon is responsible for restarting failed worker processes.


The following subsections describe several of these concepts in more detail.