Stateful computing, as the guarantee of fault tolerance and data consistency, is one of the essential features of real-time computing today. Popular real-time computing engines, including Google dataflow, Flink, spark (structure) streaming and Kafka streams, provide support for built-in state respectively. The introduction of state enables real-time applications to store metadata and intermediate data without relying on external databases. In some cases, state can even be directly used to store result data, which makes the industry wonder: what is the relationship between state and database? Is it possible to use state instead of database?
The Flink community has been exploring this topic for a long time. In general, Flink community’s efforts can be divided into two lines: one is the ability to access the state through the job query interface when the job is running, that is, the queryablestate; the other is the ability to query and modify the state offline through the state’s offline dump file (savepoint), that is, the incoming savepoint processor API.
In the 2017 release of Flink 1.2, Flink introduced the queryablestate feature to allow users to query the content of job state through a specific client , which means that Flink applications can provide real-time access to the calculation results without relying on external storage beyond the state storage medium.
Provide real-time data access only through queryable state
However, although the idea of queryablestate is more ideal, it is still in beta and can’t be used in production environment due to the many changes and limited functions depending on the underlying architecture. To solve this problem, Yang Hua, an engineer of Tencent, put forward the improvement plan of queryable state . In the mailing list, the community discussed whether queryablestate can be used to replace the database and came up with different views. The main advantages and disadvantages of state as database are summarized as follows.
- Lower data latency. In general, the calculation results of Flink application need to be synchronized to the external database, such as the calculation results of the output window triggered by timing. This synchronization usually results in a certain delay, which leads to the embarrassing situation that the calculation is real-time and the query is not real-time, while the direct state can avoid this problem.
- Stronger data consistency guarantee. According to the characteristics of external storage, the consistency guarantee provided by Flink connector or customized sinkfunction is also different. For example, for HBase that does not support multi line transactions, Flink can only guarantee the exact once delivery through the idempotence of business logic. In contrast, state has a proper delivery guarantee of exactly once.
- Save resources. Because it reduces the need to synchronize data to external storage, we can save the cost of serialization and network transmission, as well as the cost of database.
- Insufficient SLA support。 The database technology has been very mature. It has accumulated a lot in usability, fault tolerance and operation and maintenance. At this point, state is still in the primitive human period. In addition, from the perspective of location, the down time caused by version iterative maintenance or automatic restart in case of errors in the Flink job cannot achieve the high availability of database in data access.
- May cause job instability。 Ad hoc query without consideration may require scanning and returning exaggerated data, which will bring great load to the system and may affect the normal execution of the job. Even if it is a reasonable query, it may affect the efficiency of job execution in the case of a large number of concurrent.
- The amount of data stored cannot be too large。 State runtime is mainly stored in taskmanager’s local memory and disk. Excessive state will cause task manager oom or disk space shortage. In addition, a large state means a large checkpoint, which may cause the checkpoint to time out and significantly extend the job recovery time.
- Only the most basic queries are supported。 State can only query the simplest data structure, and it can not provide functions and other computing capabilities like relational databases, nor support optimization techniques such as predicate pushdown.
- It can only be read, not modified。 The state can only be modified by the job itself at runtime. If the state needs to be modified, it can only be implemented through the savepoint processor API below.
Generally speaking, at present, the disadvantages of state replacing database are far more than its advantages. However, for some jobs that do not require high data availability, it is completely reasonable to use state as database. Due to different positioning, Flink state is difficult to see the possibility of completely replacing the database in a short time, but there is no need to question the development of state towards the database in terms of data access characteristics.
Savepoint Processor API
The savepoint processor API is a new feature recently proposed by the community (see flip-42 ), which is used to analyze, modify the state dump file savepoint offline or build an initial savepoint directly from the data. The savepoint processor API belongs to state management of Flink state evolution. If queryablestate is DSL, Flink state evolution is DML, and the savepoint processor API is the most important part of DML.
The predecessor of the savepoint processor API is the third-party Bravo project . The main idea is to provide the ability for savepoint and dataset to transform each other. The typical application is to read savepoint into dataset, modify it on dataset, and then write it as a new savepoint. This is suitable for the following scenarios:
- Analyze job states to study their patterns and laws
- Troubleshooting or auditing
- Initial state built for new application
Modify savepoint, for example:
- Change job maximum parallelism
- Make big schema changes
- Fix problematic state
Savepoint, as a dump file of state, can expose data query and modification functions through the savepoint processor API, similar to an offline database, but there are many differences between the concept of state and the concept of typical relational data. Flip-43 also compares and summarizes these differences.
First of all, savepoint is the physical storage set of the states of multiple operators. The states of different operators are independent, which is similar to the table between different namespaces under the database. We can get the database corresponding to savepoint and the namespace corresponding to a single operator.
But as for table, its corresponding concept in savepoint varies according to the state type. There are three types of state: operator state, keyed state and broadcast state. Operator state and broadcast state belong to non partitioned state, i.e. state not partitioned by key, while keyed state belongs to partitioned state. For non partitioned state, state is a table, and each element of state is a row in the table; for partitioned state, all States under the same operator correspond to a table. This table has a row key like HBase, and each specific state corresponds to a column in the table.
For example, if there is a data stream of players’ scores and online hours, we need to use keyed state to record the scores and game hours of the players’ group, and operator state to record the total scores and hours of the players.
The data flow input over a period of time is as follows:
With keyed state, we register two mapstates, group score and group time, respectively, to represent the total score and the total time of the group, and update the accumulated values of the two indicators to state according to the user group keyby data stream. The table is as follows:
In contrast, if operator state is used to record the total score and total time (parallelism is set to 1), we register two states, total score and total time, and get two tables:
At this point, the corresponding relationship between savepoint and database should be clear. For savepoint, there are different statebacks to determine how the state is sustained, which obviously corresponds to the storage engine of the database. In mysql, we can change the storage engine by a simple one line command alter table XXX engine = InnoDB; MySQL will automatically complete the tedious format conversion work behind it. However, for savepoint, due to the incompatible storage formats of statebackend, it is not easy to switch statebackend at present. To this end, the community recently created flip-41  to further improve the operability of savepoint.
State as database is the trend of real-time computing. It is not to replace the use of database, but to use the experience of database field to expand the state interface to make its operation more close to the familiar database. For Flink, the external use of state can be divided into online real-time access and offline access and modification, which will be supported by queryable state and savepoint processor API respectively.
- Queryable State in Apache Flink® 1.2.0: An Overview & Demo
- Improve Queryable State and Introduce a QueryServerProxy Component
- FLIP-43: Savepoint Processor API
- Bravo: Utilities for processing Flink checkpoints/savepoints
- FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints
The author introduces:Lin Xiaobo, senior development engineer of Netease game, is responsible for the development and operation and maintenance of the real-time platform of the game data center, and currently focuses on the development and application of Apache Flink. It is a pleasure to explore problems.
Author: Lin Xiaobo
Read the original text
This is the original content of yunqi community, which can not be reproduced without permission.