Flink internal exact only three axes: state, state back end and checkpoint

Time:2021-1-20

Flink is a distributed stream processing engine, and one of the characteristics of stream processing is 7×24. So, how to ensure the continuous operation of the Flink job? The internal of Flink will store the application state in the local memory or the embedded kV database (rocksdb). Due to the distributed architecture, Flink needs to store the locally generated state persistently to avoid the loss of data due to application or node machine failure. Flink writes the state to the remote holding database through checkpoint Long term storage, which can achieve different semantic results. Through this article, you can learn what is the state of Flink, how to store the state of Flink, what are the optional state back ends of Flink, what is the global consistency checkpoint, and how to implement the result guarantee of exactly once through the checkpoint inside Flink. In addition, the content of this article is long, it is suggested to pay attention to the collection.

<!– more –>

What is state

Introduction

As for what is state, we don’t do too much analysis. First, let’s look at a code case. Case 1 is the wordcount code of spark, and case 2 is the workcount code of Flink.

  • Case 1: Spark WC
object WordCount {
  def main(args:Array[String]){
  val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
  val ssc = new StreamingContext(conf, Seconds(5))
  val lines = ssc.socketTextStream("localhost", 9999)
  val words = lines.flatMap(_.split(" "))
  val pairs = words.map(word => (word, 1))
  val wordCounts = pairs.reduceByKey(_ + _)
  wordCounts.print()
  ssc.start()
  ssc.awaitTermination()
}
}

Input:

C:\WINDOWS\system32>nc -lp 9999
hello spark
hello spark

Output:

Flink internal exact only three axes: state, state back end and checkpoint

  • Case 2: Flink WC
public class WordCount {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        DataStreamSource<String> streamSource = env.socketTextStream("localhost", 9999);
        SingleOutputStreamOperator<Tuple2<String,Integer>> words = streamSource.flatMap(new FlatMapFunction<String, Tuple2<String,Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String,Integer>> out) throws Exception {
                String[] splits = value.split("\s");
                for (String word : splits) {
                    out.collect(Tuple2.of(word, 1));
                }
            }
        });
        words.keyBy(0).sum(1).print();
        env.execute("WC");
    }
}

Input:

C:\WINDOWS\system32>nc -lp 9999
hello Flink
hello Flink

Output:
Flink internal exact only three axes: state, state back end and checkpoint

As can be seen from the above two examples, when using spark for word frequency statistics, the current statistical results are not affected by the historical statistical results, only the results of the received current data are calculated, which can be understood as stateless calculation. Let’s take another look at the example of Flink. We can see that when the word frequency is counted for the second time, the first result value is also counted together. That is to say, Flink saves the last calculation result in the state. When the word frequency is calculated for the second time, it will get the last result state first, and then combine it with the new data for calculation. This can be understood as stateful calculation, as shown in the figure below .

Flink internal exact only three axes: state, state back end and checkpoint

Category of status

Flink provides two basic types of state: theKeyed StateandOperator State. According to different state management methods, each state has two forms of existence, which are:Managed (managed)andRaw (native state). The details are shown in the table below. It should be noted that since Flink recommends the use of managed state, the following discussion focuses on managed state. For raw state, this article will not discuss it too much.

Flink internal exact only three axes: state, state back end and checkpoint

Differences between managed state and raw state

Flink internal exact only three axes: state, state back end and checkpoint

Keyed State & Operator State

Flink internal exact only three axes: state, state back end and checkpoint

Keyed State

Keyed state can only be used by functions acting on keyedstream. This state is bound to a key, that is, each key corresponds to a state. The keyed state is maintained and accessed according to the key. Flink maintains a state instance for each key. The state instance is always on the operator task that processes the key record, so the record of the same key can access the same state. As shown in the figure below, you can generate a keyedstream by using the keyby () method on a stream. Flink provides many kinds of keyed States, as follows:

Flink internal exact only three axes: state, state back end and checkpoint

  • ValueState<T>

Used to hold a single value of type T. Users can ValueState.value () to get the status value ValueState.update () to update the status. useValueStateDescriptorTo get the state handle.

  • ListState<T>

It is used to save the list of elements of type T, that is, the status value of key is a list. Users can use ListState.add () or ListState.addAll () add a new element to the list by ListState.get () to access state elements, this method will return an iteratable < T > object that can traverse all elements. Note that liststate does not support deleting a single element, but users can use update (list < T > values) to update the whole list. useListStateDescriptorTo get the state handle.

  • ReducingState<T>

When the add() method is called to add a value, a value aggregated by reducefunction will be returned immediately, which can be used by users ReducingState.get () to get the status value. useReducingStateDescriptorTo get the state handle.

  • AggregatingState<IN, OUT>

Similar to reducingstate < T >, it uses aggregatefunction to aggregate internal values, AggregatingState.get The () method evaluates the final result and returns it. useAggregatingStateDescriptorTo get the status handle

  • MapState<UK, UV>

It is used to save a set of key and value mappings, similar to the Java map collection. The user can get the corresponding state of the key through the get (UK key) method, add a key value through the put (UK, UV value) method, delete the value of the given key through the remove (UK key), and judge whether there is a corresponding key through the contains (UK key). useMapStateDescriptorTo get the state handle.

  • FoldingState<T, ACC>

In the version of Flink 1.4, the tag is out of date and will be removed in future versions and replaced by aggregatingstate.

It is worth noting that the above state primitives all support passing State.clear () method to clear the state. In addition, the above state primitives are only used to interact with the state. The real state is stored in the state back end (which will be introduced later). The state primitives are equivalent to holding the handle of the state.

Use cases of keyed state

Here is a use case of mapstate. For the use of valuestate, please refer to the official website. The details are as follows:

public class MapStateExample {

    //Count the number of each behavior of each user
    public static class UserBehaviorCnt extends RichFlatMapFunction<Tuple3<Long, String, String>, Tuple3<Long, String, Integer>> {

        //Define a mapstate handle
        private transient MapState<String, Integer> behaviorCntState;

        //Initialization status
        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
            MapStateDescriptor<String, Integer> userBehaviorMapStateDesc = new MapStateDescriptor<>(
                    "Userbehavior", // the name of the state descriptor
                    TypeInformation.of (New typehint < string > () {}), // the data type of the key in mapstate
                    TypeInformation.of (New typehint < integer > () {}) // data type of value in mapstate
            );
            Behaviorcntstate = getruntimecontext(). Getmapstate (userbehaviormapstatedesc); // get state
        }

        @Override
        public void flatMap(Tuple3<Long, String, String> value, Collector<Tuple3<Long, String, Integer>> out) throws Exception {
            Integer behaviorCnt = 1;
            //If the current state includes the behavior, + 1
            if (behaviorCntState.contains(value.f1)) {
                behaviorCnt = behaviorCntState.get(value.f1) + 1;
            }
            //Update status
            behaviorCntState.put(value.f1, behaviorCnt);
            out.collect(Tuple3.of(value.f0, value.f1, behaviorCnt));
        }
    }
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        //Simulation data source [userid, behavior, product]
        DataStreamSource<Tuple3<Long, String, String>> userBehaviors = env.fromElements(
                Tuple3.of(1L, "buy", "iphone"),
                Tuple3.of(1L, "cart", "huawei"),
                Tuple3.of(1L, "buy", "logi"),
                Tuple3.of(1L, "fav", "oppo"),
                Tuple3.of(2L, "buy", "huawei"),
                Tuple3.of(2L, "buy", "onemore"),
                Tuple3.of(2L, "fav", "iphone"));
        userBehaviors
                .keyBy(0)
                .flatMap(new UserBehaviorCnt())
                .print();
        env.execute("MapStateExample");
    }
}

Results the output was as follows

Flink internal exact only three axes: state, state back end and checkpoint

State lifecycle management (TTL)

For any type of keyed state, the life cycle (TTL) of the state can be set, that is, the survival time of the state, so as to ensure that the state data can be cleaned up in time within the specified time. If the TTL of state is configured, the stored state will be cleared when the state expires. The state lifecycle function can be configured through statettlconfig, and then the statettlconfig configuration can be passed into the enabletimetolive method in statedescriptor. The code example is as follows:

StateTtlConfig ttlConfig = StateTtlConfig
                 //Specify TTL duration as 10s
                .newBuilder(Time.seconds(10))
                 //Valid only for create and write operations
                .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
                 //Do not return expired data
                .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) 
                .build();

        //Initialization status
        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
            MapStateDescriptor<String, Integer> userBehaviorMapStateDesc = new MapStateDescriptor<>(
                    "Userbehavior", // the name of the state descriptor
                    TypeInformation.of (New typehint < string > () {}), // the data type of the key in mapstate
                    TypeInformation.of (New typehint < integer > () {}) // data type of value in mapstate

            );
            //Set statettlconfig
            userBehaviorMapStateDesc.enableTimeToLive(ttlConfig);
            Behaviorcntstate = getruntimecontext(). Getmapstate (userbehaviormapstatedesc); // get state

        }

When statettlconfig is created, the newbuilder method must be specified, and the parameters of expiration time are set in newbuilder. Optional for all other parameters or use default values. There are three types in the setupdatetype method:

public enum UpdateType {
        //Disable TTL and never expire
        Disabled,
        //Update TTL when creating and writing
        OnCreateAndWrite,
        //Similar to oncreateandwrite, but TTL is updated during read operations
        OnReadAndWrite
    }

It is worth noting that the expired state data is configured according to the updatetype parameter, and TTL will be updated only when it is written or read. That is to say, if a state indicator is not used or updated all the time, the cleaning operation of the state data will never be triggered. This situation may lead to the increasing state data in the system. Currently users can use StateTtlConfig.cleanupFullSnapshot It is set to clear the state data when the state snapshot is triggered, but the configuration change is not suitable for incremental checkpointing of rocksdb.

When the statettlconfig above is created, you can specify the setstatevisibility to configure the visibility of the state, and determine whether to return the state data according to whether the expired data is cleaned up.

/**
     *Return expired data
     */
    public enum StateVisibility {
        //If the data is not cleaned up, it can be returned
        ReturnExpiredIfNotCleanedUp,
        //Never return expired data, default value
        NeverReturnExpired
    }

Operator State

The function of operator state is an operator task, which means that all records in the same parallel task can access the same state. Operator state cannot be accessed by other tasks, regardless of whether the task is the same operator. As shown in the figure below.

Flink internal exact only three axes: state, state back end and checkpoint

Operator state is a kind of non keyed state, which is associated with parallel operator instances. For example, in Kafka connector, each Kafka consumer operator instance corresponds to a partition of Kafka, and the topic partition and offsets offset are maintained as the operator state of the operator. In Flink, you can implement listcheckpointed < T extensions serializable > interface or checkpointedfunction interface to implement an operator state.

First of all, let’s take a look at the specific implementation of these two interfaces, and then give the specific use cases of these two interfaces. Take a look at the source code of the listcheckpointed interface, as follows:

public interface ListCheckpointed<T extends Serializable> {
    
    /**
     *Gets the current state of an operator instance, which includes all the results when the operator instance was called before
     *Returns a snapshot of the status of a function in the form of a list
     *This method is called when Flink triggers the generation of checkpoints
     *The ID of @ param checkpoint ID checkpoint is a unique and monotonically increasing value
     *Timestamp when checkpoint is triggered by @ param timestamp job manager
     *@ return returns an operator state list. If it is null, it returns an empty list
     * @throws Exception
     */
    List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
    /**
     *It is called when initializing the function state, which may be when the job starts or when the fault recovers
     *Restore the function state according to the list provided
     * Note: when you implement this method, you need to call this method before the RichFunction#open () method.
     *@ param state the state list of the restored operator instance may be empty
     * @throws Exception
     */
    void restoreState(List<T> state) throws Exception;
}

When using the operator liststate, the strategy of redistribution (the mode of state recovery) is shown in the following figure when expanding or shrinking the capacity

Flink internal exact only three axes: state, state back end and checkpoint

The above redistribution strategy isEven-split RedistributionThat is to say, each operator instance contains a list of partial state elements, and the whole state data is the set of all list lists. When the restore / redistribution action is triggered, the state data is evenly distributed into a list with the same degree of parallelism as the operator. There is a list in each task instance, which can be empty or contain multiple elements.

Let’s look at the checkpointedfunction interface again. The source code is as follows:

public interface CheckpointedFunction {

    /**
     * calls before generating checkpoints.
     *The purpose of this method is to ensure that all state objects have been updated before the checkpoint starts
     *@ param context uses functionsnapshot context as a parameter
     *The metadata information of checkpoint can be obtained from functionsnapshot context,
     *For example, the checkpoint number and the timestamp of the job manager when initializing the checkpoint
     * @throws Exception
     */
    void snapshotState(FunctionSnapshotContext context) throws Exception;

    /**
     *Called when creating a parallel instance of checkpointedfunction,
     *This method is triggered when the application starts or fails to restart
     *@ param context passes in the functioninitializationcontext object,
     *You can use this object to access the operatorstatestore and keyedstatestore objects,
     *These two objects can get the handle of the state, that is, register the function state through the Flink runtime and return the state object
     *For example: valuestate, liststate, etc
     * @throws Exception
     */
    void initializeState(FunctionInitializationContext context) throws Exception;
}

Checkpointedfunction interface is the lowest level interface used to specify stateful functions. This interface provides hooks for registering and maintaining keyed state and operator state (that is, keyed state and operator state can be used at the same time). In addition, it is the only one that supports list Union state. For union list state, another redistribution strategy provided by Flink for operator state is usedUnion RedistributionThat is, each operator instance contains a list of all state elements. When the restore / redistribution action is triggered, each operator can obtain a complete list of state elements. The details are shown in the figure below:

Flink internal exact only three axes: state, state back end and checkpoint

ListCheckpointed

Compared with the checkpointedfunction interface, the listcheckpointed interface is less flexible. It can only support the list type state, and only support the data recoveryeven-redistributionStrategy. Unlike the keyed state (such as value state and liststate) provided by Flink, the interface directly registers in the state backend. It needs to implement the operator state as a member variable, and then interact with the state backend through the callback function provided by the interface. The code cases are as follows:

public class ListCheckpointedExample {
    private static class UserBehaviorCnt extends RichFlatMapFunction<Tuple3<Long, String, String>, Tuple2<String, Long>> implements ListCheckpointed<Long> {
        private Long userBuyBehaviorCnt = 0L;
        @Override
        public void flatMap(Tuple3<Long, String, String> value, Collector<Tuple2<String, Long>> out) throws Exception {
            if(value.f1.equals("buy")){
                userBuyBehaviorCnt ++;
                out.collect(Tuple2.of("buy",userBuyBehaviorCnt));
            }
        }
        @Override
        public List<Long> snapshotState(long checkpointId, long timestamp) throws Exception {
            //Returns a list collection of individual elements, which is the number of user purchases
            return Collections.singletonList(userBuyBehaviorCnt);
        }
        @Override
        public void restoreState(List<Long> state) throws Exception {
            //After capacity expansion and capacity reduction, the states of other subtasks need to be added together for state recovery
            for (Long cnt : state) {
                userBuyBehaviorCnt += 1;
            }
        }
    }
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        //Simulation data source [userid, behavior, product]
        DataStreamSource<Tuple3<Long, String, String>> userBehaviors = env.fromElements(
                Tuple3.of(1L, "buy", "iphone"),
                Tuple3.of(1L, "cart", "huawei"),
                Tuple3.of(1L, "buy", "logi"),
                Tuple3.of(1L, "fav", "oppo"),
                Tuple3.of(2L, "buy", "huawei"),
                Tuple3.of(2L, "buy", "onemore"),
                Tuple3.of(2L, "fav", "iphone"));

        userBehaviors
                .flatMap(new UserBehaviorCnt())
                .print();

        env.execute("ListCheckpointedExample");
    }
}

CheckpointedFunction

The checkpointedfunction interface provides richer operations. For example, it supports union list state and can access keyedstate. For redistribution policy, if the even split redistribution policy is used, the operator state is obtained through context. Getliststate (descriptor); if the unionredistribution policy is used, the operator state is obtained through context. Getunionlist State (descriptor). The use cases are as follows:

public class CheckpointFunctionExample {
    private static class UserBehaviorCnt implements CheckpointedFunction, FlatMapFunction<Tuple3<Long, String, String>, Tuple3<Long, Long, Long>> {
        //A local variable that counts the number of user behaviors per operator instance
        private Long opUserBehaviorCnt = 0L;
        //The state of each key stores the relevant state of the key
        private ValueState<Long> keyedCntState;
        //Define the operator state to store the state of the operator
        private ListState<Long> opCntState;

        @Override
        public void flatMap(Tuple3<Long, String, String> value, Collector<Tuple3<Long, Long, Long>> out) throws Exception {
            if (value.f1.equals("buy")) {
                //Update operator state local variable values
                opUserBehaviorCnt += 1;
                Long keyedCount = keyedCntState.value();
                //Update the state of keyedstate and judge whether the state is null, otherwise the null pointer is abnormal
                keyedCntState.update(keyedCount == null ? 1L : keyedCount + 1 );
                //Result output
                out.collect(Tuple3.of(value.f0, keyedCntState.value(), opUserBehaviorCnt));
            }
        }
        @Override
        public void snapshotState(FunctionSnapshotContext context) throws Exception {
            //Update the operator state with the opuserbehaviorcnt local variable
            opCntState.clear();
            opCntState.add(opUserBehaviorCnt);
        }

        @Override
        public void initializeState(FunctionInitializationContext context) throws Exception {

            //Define the statedescriptor of keyedstate through keyedstatestore
            ValueStateDescriptor valueStateDescriptor = new ValueStateDescriptor("keyedCnt", TypeInformation.of(new TypeHint<Long>() {
            }));

            //Define the statedescriptor of the operatorstate through the operatorstatestore
            ListStateDescriptor opStateDescriptor = new ListStateDescriptor("opCnt", TypeInformation.of(new TypeHint<Long>() {
            }));
            //Initializes the keyed state state value
            keyedCntState = context.getKeyedStateStore().getState(valueStateDescriptor);
            //Initialize the operator state
            opCntState = context.getOperatorStateStore().getListState(opStateDescriptor);
            //Initializes the local variable operator state
            for (Long state : opCntState.get()) {
                opUserBehaviorCnt += state;
            }
        }
    }

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        //Simulation data source [userid, behavior, product]
        DataStreamSource<Tuple3<Long, String, String>> userBehaviors = env.fromElements(
                Tuple3.of(1L, "buy", "iphone"),
                Tuple3.of(1L, "cart", "huawei"),
                Tuple3.of(1L, "buy", "logi"),
                Tuple3.of(1L, "fav", "oppo"),
                Tuple3.of(2L, "buy", "huawei"),
                Tuple3.of(2L, "buy", "onemore"),
                Tuple3.of(2L, "fav", "iphone"));

        userBehaviors
                .keyBy(0)
                .flatMap(new UserBehaviorCnt())
                .print();
        env.execute("CheckpointFunctionExample");
    }
}

What is a state backend

The above states need to be stored in the state backend, and then persistent to the external storage system when the checkpoint is triggered. Flink provides three types of state back ends: memory based state back ends(MemoryStateBackendState backend based on file system(FsStateBackend)And based on rockdb as the storage mediumRocksDB StateBackend. These three types of statebackend can effectively store the state data generated in the process of Flink streaming calculation. By default, Flink uses memorystatebackend. The differences are shown in the following table. The following describes the characteristics of each state.
Flink internal exact only three axes: state, state back end and checkpoint

Category of status backend

MemoryStateBackend

Memorystatebackend stores all state data in JVM heap memory, including key / value state created by user in using datastream API, state data cached in window, trigger and other data. Memorystatebackend is very fast and efficient, but it also has many limitations. The most important is the memory capacity limitation. Once too much state data is stored, it will lead to system memory overflow and other problems, which will affect the normal operation of the whole application. At the same time, if there is a problem with the machine, the state data in the whole host memory will be lost, and then the state data in the task cannot be recovered. Therefore, from the perspective of data security, it is recommended that users avoid using memorystatebackend in the production environment as much as possible. Flink takes memorystatebackend as the default state backend.

Memorystatebackend is more suitable for testing environment, and is used for local debugging and verification. It is not recommended to use it in production environment. However, if the amount of application state data is not large, such as using a large number of non state computing operators, it can also make memorystatebackend

FsStateBackend

Fsstatebackend is a state backend based on file system. The file system here can be local file system or HDFS distributed file system. The constructor for creating fsstatebackend is as follows:

FsStateBackend(Path checkpointDataUri, boolean asynchronousSnapshots)

If the path is a local path, its format is “file: / / / data / Flink / checkpoints”; if the path is an HDFS path, its format is “file: / / / data / Flink / checkpoints”“ hdfs://nameservice/flink/checkpoints ”。 The second Boolean parameter in fsstatebackend specifies whether to record the state data in a synchronous way. By default, the asynchronous way is used to synchronize the state data to the file system. The asynchronous way can avoid affecting the flow computing task in the checkpoint process as much as possible. If you want to synchronize the checkpoint data of state data, you can specify the second parameter as true.

Compared with memorystatebackend, fsstatebackend is more suitable for very large task states. For example, applications contain very long time window computing, or scenarios with a large amount of key / value state data. At this time, the system memory is not enough to support the storage of state data. At the same time, the biggest advantage of fsstatebackend is that it is relatively stable. When checkpointing, it can persist the state to a distributed file system like HDFS to ensure the security of state data to the greatest extent.

RocksDBStateBackend

Unlike the previous state backend, rocksdbstatebackend needs to introduce related dependency packages separately. Rocksdb is a key / value memory storage system, similar to HBase, which is a kind of LSM dB with mixed memory and disk. When writing data, it will first write into the write buffer (similar to HBase’s memory), and then flush to the disk file. When reading data, it will now block cache (similar to HBase’s block cache), so the speed will be very fast.

Rocksdbstatebackend has higher performance than fsstatebackend, mainly because rocksdb stores the latest hot data, and then synchronizes it to the file system asynchronously. However, rocksdbstatebackend has weaker performance than memorystatebackend.

It should be noted that rocksdb does not support synchronous checkpoints, and there is no synchronous snapshot option in the construction method. However, rocksdb supports incremental checkpoints, which is also the only incremental checkpoint backend at present. It means that you do not need to upload all SST files to the checkpoint directory, just need to upload the newly generated SST files. Its checkpoint is stored in the external file system (local or HDFS), and its capacity limit is as long as the total amount of state on a single taskmanager does not exceed its memory + disk, the maximum size of a single key is 2G, and the total size does not exceed the configured file system capacity. For jobs in super large state, such as day level window aggregation, this state can be used.

Configuration status backend

By default, the state backend used by Flink is memorystatebackend, so no display configuration is required. For other state backend, explicit configuration is needed. There are two levels of statebackend configuration in Flink: one is in the program, which is only valid for the current application; the other is through theflink-conf.yamlOnce configured, it will be effective for all applications on the entire Flink cluster.

  • Application level configuration
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(new FsStateBackend("hdfs://namenode:40010/flink/checkpoints"));

If rocksdbstatebackend is used, the rockdb dependency library needs to be introduced separately, as follows:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
    <version>1.10.0</version>
    <scope>provided</scope>
</dependency>

The usage is similar to fsstatebackend, as follows:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(new RocksDBStateBackend("hdfs://namenode:40010/flink/checkpoints"));
  • Cluster level configuration

The specific configuration items are in Flink- conf.yaml File, as shown in the following code, the parameter state.backend Indicates the statebackend type, state.checkpoints.dir The specific state storage path is configured. In the code, the file system is used as the statebackend, and then the corresponding HDFS file path is specified as the checkpoint folder of the state.

#Using file system to store
state.backend: filesystem
#Checkpoint storage path
state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints

If you want to configure the cluster level state backend with rocksdbstatebackend, you can use the following configuration:

#The number of threads that operate rocksdbstatebackend. The default value is 1
state.backend.rocksdb.checkpoint.transfer.thread.num: 1
#Specifies the local file path where rocksdb stores state data
state.backend.rocksdb.localdir: /var/rockdb/checkpoints
#It is used to specify the factory class implementation class of timer service. By default, it is "heap" or "rocksdb"
state.backend.rocksdb.timer-service.factory: HEAP

What is checkpoint

The above explains the state of Flink and the state back end. The state is stored in the state back end. In order to ensure state fault tolerance, Flink provides fault handling measures, which are called checkpoint. Checkpoint is the core function of Flink to realize fault tolerance. It mainly triggers checkpoint periodically to make state snapshot persistent to external storage system (such as HDFS). In this way, if the Flink program fails, the state can be recovered from the last checkpoint to provide fault tolerance. In addition, through checkpoint mechanism, Flink can implement the exact only semantics (exact only in Flink, about end-to-end exact only)_ Flink is implemented by two-phase commit protocol. Next, we will analyze the checkpoint mechanism of Flink in detail.

Generation of checkpoints

Flink internal exact only three axes: state, state back end and checkpoint

As shown in the figure above, the input stream is user behavior data, including buy and cart. Each behavior data has an offset, and the number of each behavior is counted.

Step 1: the checkpoint coordinator of the job manager triggers the checkpoint.

Step 2: suppose that checkpoint is triggered when [cart, 3] is consumed. At this time, the data source will write the consumption offset 3 to the persistent storage.

Step 3: after writing, the source will feed back the state handle to the checkpoint coordinator of the job manager.

Step 4: then the operators count buy and count cart will do the same

Step 5: after all operators have completed the above steps, that is, when the checkpoint coordinator collects all the state handles of all tasks, it is considered that the checkpoint this time is completed globally, and a checkpoint meta is backed up in the persistent storage File, then the entire checkpoint is completed. If one of them fails, the checkpoint will fail this time.

Recovery of checkpoints

Through the above analysis, you may have a preliminary understanding of Flink checkpoint. So next, let’s look at how to recover from checkpoints.

  • The task failed

Flink internal exact only three axes: state, state back end and checkpoint

  • Restart job

Flink internal exact only three axes: state, state back end and checkpoint

  • Restore checkpoint

Flink internal exact only three axes: state, state back end and checkpoint

  • Continue processing data

Flink internal exact only three axes: state, state back end and checkpoint

The above process is summarized as follows:

  • Step 1: restart the job
  • Step 2: recover the state data from the last checkpoint
  • Step 3: continue to process new data

Implementation of exactly once in Flink

Flink provides precise processing semantics, which can be understood as: data may be repeatedly calculated, but there is only one result state. Flink implements one-time processing semantics through checkpoint mechanism. When Flink triggers checkpoint, it inserts checkpoint barriers into the source side. Checkpoint barriers are inserted from the source side and are passed to downstream operators. Checkpoint barriers carry a checkpoint ID, which is used to identify which checkpoint belongs to. Checkpoint barriers divide the flow logic into two parts. For the case of two streams, the barrier alignment method is used to achieve accurate processing semantics.

As for what is checkpoint barrier, you can see the source code description of the checkpoint barrier class as follows:

/**
 *Checkpoint barriers are used to achieve checkpoint alignment in data stream
 *Checkpoint barrier is inserted into source by checkpoint coordinator of job manager,
 *Source will broadcast the barrier to the downstream operator. When an operator receives the checkpoint barrier of one of the input streams,
 *It will know that it has finished processing the data between this checkpoint and the last checkpoint
 * 
 *Once an operator receives the checkpoint barrier of all input streams,
 *It means that the operator has finished processing the data up to the current checkpoint,
 *The checkpoint can be triggered and the barrier can be passed downstream
 * 
 *According to the processing semantics selected by the user, the data of the next checkpoint will be cached before the checkpoint is completed,
 *Until the checkpoint is completed (exactly only)
 * 
 *The ID of checkpoint barrier is strictly monotonically increasing
 *
 */
    public class CheckpointBarrier extends RuntimeEvent {...}

It can be seen that the main function of checkpoint barrier is to achieve checkpoint alignment, so that it can achieve exactly once processing semantics.

The checkpoint process will be decomposed as follows:

Figure 1 includes two flows. Each task consumes a piece of user behavior data (including buy and cart). The number represents the offset of the data. The count buy task counts the number of purchase behaviors, and the count cart counts the number of add purchase behaviors.

Flink internal exact only three axes: state, state back end and checkpoint

In Figure 2, when the checkpoint is triggered, the job manager will send a new checkpoint number to each data source to start the checkpoint generation process.

Flink internal exact only three axes: state, state back end and checkpoint

  • In Figure 3, when the source task receives the message, it will stop sending data, then use the state back-end trigger to generate a local state checkpoint, and broadcast the checkpoint barrier and checkpoint ID to all outgoing data flow partitions. The status backend will notify the task after the checkpoint is completed, and then the task will send a confirmation message to the job manager. After the checkpoint barrier is issued, the source task returns to normal.

Flink internal exact only three axes: state, state back end and checkpoint

  • In Figure 4, the checkpoint barrier issued by the source task will be sent to the downstream operator task connected with it. When the task receives a new checkpoint barrier, it will continue to wait for the checkpoint barrier of other input partitions to arrive. This process is calledBarrier alignmentBefore the arrival of checkpoint barrier, the incoming data line will be cached.

Flink internal exact only three axes: state, state back end and checkpoint

  • In Figure 5, after the task has collected all the checkpoint barriers of the input partition, it will inform the state back end to start generating checkpoints, and broadcast the checkpoint barrier to the downstream operator.

Flink internal exact only three axes: state, state back end and checkpoint

  • In Figure 6, after the checkpoint barrier is issued, the task starts to process the cached data generated by the barrier alignment. After the cached data is processed, it will continue to process the input stream data.

Flink internal exact only three axes: state, state back end and checkpoint

  • As shown in Figure 7, the checkpoint barrier will be sent to the sink end. After receiving the checkpoint barrier, the sink task will write its own state to the checkpoint like other operator tasks, and then send a confirmation message to the job manager. After the job manager receives the confirmation message from all tasks, it marks this checkpoint as complete.

Flink internal exact only three axes: state, state back end and checkpoint

Use cases

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//The time interval of checkpoint. If the status is relatively large, you can increase the value appropriately
env.enableCheckpointing(1000);
//Configure processing semantics. The default is exactly once
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
//The minimum time interval between two checkpoints to prevent checkpoint backlog due to long checkpoint time
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
//The upper limit of checkpoint execution time. If the threshold is exceeded, checkpoint will be interrupted
env.getCheckpointConfig().setCheckpointTimeout(60000);
//The maximum number of checkpoints executed in parallel is 1 by default. Multiple checkpoints can be specified to start multiple checkpoints at the same time and improve efficiency
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
//Periodic external checkpoints are set to persist the state data to the external system,
//Using this method will not clean up the checkpoint data during the normal stop of the task
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// allow job recovery fallback to checkpoint when there is a more recent savepoint
env.getCheckpointConfig().setPreferCheckpointForRecovery(true);

summary

This paper starts with the state of Flink, and explains what is state through wordcount of spark and work count of Flink. Then the classification of state and the use of state are described in detail. Then it discusses three kinds of state backend provided by Flink, and gives the instructions of using state backend. Finally, the checkpoint mechanism of Flink is explained in detail in the form of diagram and text, and the program configuration when using checkpoint is given.

  • Pay attention to the official account:Big data technology and data warehouse
  • Get 100 g big data for free