Using watermark in Flink

For a streaming application of unbounded data sets, the completeness of all incoming data is crucial. To guarantee that every data is processed, you can use watermarks in Flink applications to track the progress of time for events.

However, with event time, the timestamp only indicates when the event was created. With only the event time, it is not clear when the events are processed in the application. To track the time for an event time based application, watermark can be used. Watermark is a method to measure the progress of the event time. With event time, every input event has an embedded timestamp. This timestamp can be used for watermarks to indicate the time of incoming events to the operator. Like this, you can set the watermark to the time until the operator waits for the events that are being processed.

Let's think of a streaming application with a session window that aggregates data between 10:00 and 11:00. The given watermark will be the time until the data is processed. In this case, the watermark is 11:00. This means the window will progress the events that were created until 11:00.