State consistency



In Flink, fault tolerance, state consistency and checkpoint mechanism are realized by state,

Generally speaking, the state is to back up the data or the intermediate results of program operation, which can ensure that the program can recover from the midway error;

State type

What are the specific types of saved states in the program and which states can be saved?

State consistency
State consistency

State backend

The state back end refers to the place where the data we are going to back up exists. There are three ways to save the state in Flink. By default, it is saved in memory

  1. In memory: memorystatebackend
  2. Rockdb statebackend, which stores the state in the local rockdb database, is actually a mixed use of memory and disk;
  3. Fsstatebackend: the local state is saved in the task manager. When checkpoint is used, it is stored in the file system


Checkpoint ensures the reliability of Flink, because it also realizes the consistency of data. In fact, it saves every execution state of the program regularly, and can recover after error;

Working mechanism of checkpoint

The operation mechanism of checkpoint is as follows:

There is a coordinator in the job manager, which generates the barrier periodically, inserts the data from the initial data source, and broadcasts the data as it flows;

When the barrier reaches a task, it is equivalent to a switch to start the backup of the current task. Usually, a task will send data streams to it corresponding to multiple upstream tasks, and the task will trigger the backup mechanism after all the upstream tasks arrive,

Of course, the barrier alignment feature is involved. When the barrier in an upstream task arrives first, the data after the barrier will wait until all the barriers in all the upstream tasks arrive. In this way, data consistency and accurate consumption can be guaranteed;

After the task completes the backup, it will send a message to the coordinator in the jobmanager, informing the address of the checkpoint saved this time and the relevant metadata;

When the checkpoint is finished in the end of data processing, after the jobmanager receives it, it will notify the global checkpoint to complete, and it will back up the metadata at the time;

Of course, in sink, two-stage submission (2pc) will be involved, that is, pre submission at the beginning, formal submission after receiving the completion notice from jobmanager, so as to ensure accurate one-time consumption;

Checkpoint algorithm

Chandy Lamport algorithm, asynchronous boundary snapshot algorithm, this algorithm can realize the processing of non-stop flow and checkpoint backup at the same time;

Data consistency

Data consistency level:

**at-most-once(_ (once at most):_

This is actually a euphemism for no guarantee of correctness – after a fault occurs, the count result may be lost

at-least-once(At least once:

This means that the count may be greater than the correct value, but never less than the correct value. In other words, the counting program may count more, but never less, after a failure;

exactly-once(Strictly once:

This means that the system guarantees that the counting result obtained after the failure is consistent with the correct value

End to end consistency

Source side

The external source is required to reset the reading position of the data. At present, the Kafka source we use has this feature: when reading the data, we can reset the reading positionSpecify offset

Inside of Flink

Rely on checkpoint mechanism

Sink end

It is necessary to ensure that data will not be written to external system repeatedly when recovering from failure

a) Idempotent write

The so-called idempotent operation means that an operation can be repeated many times, but it only leads to one result change. In other words, repeated execution later will not work.

b) Transactional writing

You need to build a transaction to write to the external system. The built transaction corresponds to the checkpoint. When the checkpoint is really completed, all the corresponding results are written into the sink system. For transactional writing, there are two ways to implement it: pre write log (wal) and two-phase commit (2pc)

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]