Saving the Window State
One issue with windowing is that tuples cannot be acknowledged until they exit the window.
For example, consider a one-hour window that slides every minute. The tuples in the
window are evaluated (passed to the bolt
execute method) every
minute, but tuples that arrived during the first minute are acknowledged only
after one hour and one minute. If there is a system outage after one hour, Storm
replays all tuples from the starting point through the sixtieth minute. The
bolt’s execute method is invoked with the same set of tuples 60 times; every
window is reevaluated. One way to avoid this is to track tuples that have
already been evaluated, save this information in an external durable location,
and use this information to trim duplicate window evaluation during
recovery.
For more information about state management and how it can be used to avoid duplicate window evaluations, see Implementing State Management.