Guarantees
Storm relies on the acking mechanism to replay tuples in case of failures. It is
possible that the state is committed but the worker crashes before acking the
tuples. In this case the tuples are replayed causing duplicate state updates. Also
currently the StatefulBoltExecutor
continues to process the tuples from
a stream after it has received a checkpoint tuple on one stream while waiting for
checkpoint to arrive on other input streams for saving the state. This can also
cause duplicate state updates during recovery.
The state abstraction does not eliminate duplicate evaluations and currently provides only at-least once guarantee.
To provide the at-least-once guarantee, all bolts in a stateful topology are
expected to anchor the tuples while emitting and ack the input tuples once it is
processed. For non-stateful bolts, the anchoring and acking can be automatically
managed by extending the BaseBasicBolt
. Stateful bolts are expected to
anchor tuples while emitting and ack the tuple after processing like in the
WordCountBolt
example in the State management subsection.