Handling state in Flink

You can use Flink to store the state of your application locally in state backends that guarante lower latency when accessing your processed data. You can also create checkpoints and savepoints to have a fault-tolerant backup of your streaming application on a durable storage.

Stateful applications process dataflows with operations that store and access information across multiple events. There are two basic types of states in Flink: keyed state and operator state. The difference between them is that a keyed state is always bound to keys and can only be used on keyed streams. In operator state, the state is bound to an operator on one parallel substream. Keyed streams are created by defining keys for the elements of a stream. The keyed stream is read by the stateful operator and per key state is stored locally and can be accessed by the operator throughout the data streaming process. A basic stateful application structure is shown in the following illustration.

State backend

Flink applications store and access the working instance of their state locally, and preferably in memory. In Flink, the implementation of these local stores is called state backends. Flink also creates asynchronous and periodic snapshots of the stored state in the application, known as checkpoints. When you enable checkpointing, the snapshots of the states are created and saved into a given durable storage, like HDFS. The following illustration shows how the storing of state and checkpoint is implemented in a Flink application.

CSA supports Java heap and RocksDB state backends. The practical difference between them is that the Heap option is recommended for small states, while the RocksDB option is the production grade solution for large states. RocksDB also supports incremental checkpointing.