State backends in Flink

Time:2020-10-25

Flink offers three out of the boxState backend(for state storage)

  • MemoryStateBackend
  • FsStateBackend
  • RocksDBStateBackend
MemoryStateBackend

The memory statebackend willstateSave as object intaskManagerThrough thecheckpointMechanism, memorystatebackend will snapshot and save the stateJobmanagerIn heap memory.

Memory statebackend can be configured to use asynchronous snapshots, which can avoid blocking pipes. Currently, it is enabled by default.

Limitation of memorystatebackend:

  • The default limit size of each independent state is 5MB, and the capacity can be increased through the constructor;
  • The size of the state cannot exceed the framesize of akka.
  • The aggregate state must be put into the memory of the job manager.

Applicable scenarios of memorystatebackend:

  • Local debugging
  • Scenarios with small amount of Flink task state data
FsStateBackend

Fsstatebackend is set by configuring the file system path, and the dynamic data is saved in the memory of taskmanger. Through checkpoint mechanism, the state snapshot is written to the configured file system or directory.

val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//FS state backend configuration. If it is file: //, it will be in the local of taskmanager
val checkPointPath = new Path("hdfs:///flink/checkpoints") 
val fsStateBackend: StateBackend = new FsStateBackend(checkPointPath)
env.setStateBackend(fsStateBackend)

Applicable scenarios of fsstatebackend:

  • Tasks with large state, long window and large key / value state
  • Full high availability configuration
RocksDBStateBackend

Rocksdbstatebackend saves the working state in the rocksdb database (located in the data directory of taskmanager). Through checkpoint, the entire rocksdb database is copied to the configured file system or directory

private val checkpointDataUri = "hdfs:///flink/checkpoints"
  private val tmpDir = "file:///tmp/rocksdb/data/"
  val env = StreamExecutionEnvironment.getExecutionEnvironment
  val fsStateBackend: StateBackend = new FsStateBackend(checkpointDataUri)
  val rocksDBBackend: RocksDBStateBackend = new RocksDBStateBackend(fsStateBackend, TernaryBoolean.TRUE)
  val config = new Configuration()
  //Timer is divided into heap (default, better performance) and rocksdb (well extended)
 config.setString(RocksDBOptions.TIMER_SERVICE_FACTORY,RocksDBStateBackend.PriorityQueueStateType.ROCKSDB.toString)
  rocksDBBackend.configure(config)
  rocksDBBackend.setDbStoragePath(tmpDir)
  env.setStateBackend(rocksDBBackend.asInstanceOf[StateBackend])

Applicable scenarios of rocksdbstatebackend:

  • Tasks with large state, long window and large key / value state
  • Full high availability configuration

Because rocksdbstatebackend stores the working state in taskmanger’s local file system, the number of States is only limited by the local disk capacity limit. Compared with fsstatebackend, rocksdbstatebackend can avoid the situation that the number of States increases sharply and the memory is insufficient due to the continuous running of the Flink task, so it is suitable for use in production environment.

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’39.107.86.223′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]