Handling state in Flink
You can use Flink to store the state of your application locally in state backends that guarante lower latency when accessing your processed data. You can also create checkpoints and savepoints to have a fault-tolerant backup of your streaming application on a durable storage.
Stateful applications process dataflows with operations that store and access information
across multiple events. There are two basic types of states in Flink: keyed state and
operator state. The difference between them is that a keyed state is always bound to
keys and can only be used on keyed streams. In operator state, the state is bound to an
operator on one parallel substream. Keyed streams are created by defining keys for the
elements of a stream. The keyed stream is read by the stateful operator and per key
state is stored locally and can be accessed by the operator throughout the data
streaming process. A basic stateful application structure is shown in the following
illustration.
State backend
Flink applications store and access the working instance of their state locally, and
preferably in memory. In Flink, the implementation of these local stores is called
state backends. Flink also creates asynchronous and periodic snapshots of
the stored state in the application, known as checkpoints. When you enable
checkpointing, the snapshots of the states are created and saved into a given
durable storage, like HDFS. The following illustration shows how the storing of
state and checkpoint is implemented in a Flink application.
CSA supports Java heap and RocksDB state backends. The practical difference between them is that the Heap option is recommended for small states, while the RocksDB option is the production grade solution for large states. RocksDB also supports incremental checkpointing.