Why is streaming the future?


Author Stephan Ewen
Tidy up Qin Jiangjie

This paper summarizes the speech “Stream Processing takes on Everything” by Ververica Co-founder and CTO-Stephan Ewen at Flink Forward China 2018.
The theme of this speech seems radical: streaming solves all problems. Many people still think Flink is a stream processing engine. In fact, Flink can do a lot of other work, such as batch processing and application. In this presentation, Stephan will first give a brief description of his views on Flink functionality, and then go into a specific area of application and event handling scenarios. This scenario does not seem to be a stream processing usage scenario at first glance, but in Stephan’s view, it is actually a very interesting stream processing usage scenario.

Flink Community Journal Download Address

  • Phase 1: Not just stream computing
  • Phase II: Redefining Computing

The figure above explains why flow processing can handle everything, and it is a natural and powerful idea to treat data as a flow. Most of the data generation process is generated over time, such as a Petabyte data will not be generated empty. These data are usually the accumulation of events, such as payment, putting goods in shopping carts, web browsing, sensor sampling and output.

Based on the idea that data is flow, we can understand data processing accordingly. For example, the historical data in the past can be regarded as a limited flow up to a certain time, or a real-time processing application can be regarded as processing future arrival data from a certain time. It may stop at some point in the future, so it becomes batch processing of limited data from the start to the stop. Of course, it’s also possible to keep running and constantly process new arrivals. This important way of understanding data is very powerful. Based on this understanding, Flink can support all scenarios in the whole data processing area.

The most well-known Flink usage scenarios are flow analysis, continuous processing (or progressive processing), where Flink processes data in real-time or near-real-time, or collects historical data mentioned earlier and calculates these events continuously. Xiao Wei mentioned a very good example in his previous speech to illustrate how Flink can be optimized to do some special processing for limited data sets, which enables Flink to support batch processing scenarios very well and is comparable to the most advanced batch engine in terms of performance. At the other end of this axis is the scenario I’m going to describe in my speech today – event-driven applications. Such applications are ubiquitous in any service or micro-service architecture. Such applications receive various events (possibly RPC calls, HTTP requests) and respond to them, such as putting goods in a shopping cart or joining a group in a social network.

Before I move on to today’s speech, I’d like to introduce the community’s recent work in the traditional area of Flink (real-time analysis, continuous processing). Flink 1.7 was released on November 30, 2018. In Flink 1.7, some very interesting features were added for typical flow processing scenarios. For example, I am very interested in Join with time version in streaming SQL. One basic idea is that there are two different streams, one is defined as a reference table that changes over time, and the other is an event stream that Join with the reference table. For example, the event flow is an order flow, and the reference table is the exchange rate that is constantly updated, and each order needs to be converted using the latest exchange rate, and the results of conversion are output to the result table. This example is not easy to express in standard SQL, but after a little extension of Streaming SQL, the logical expression becomes very simple, and we find that there are many application scenarios for this expression.

Another powerful new feature in the field of stream processing is the combination of complex event processing (CEP) and SQL. CEP applications observe event patterns. For example, when a CEP application observes the stock market, it may do some trading when there are two UPS followed by a downturn. Another example is the application of an observation thermometer. When it finds that a thermometer has no operation for two minutes after two readings exceeding 90 degrees Celsius, it may perform some operations. The combination with SQL makes the expression of such logic very simple.

The third Flink 1.7 feature that did a lot of work was the Schema upgrade. This function is closely related to flow-based applications. Just as you can upgrade your database with data Schema, you can modify the type of column in the Flink table or rewrite a column.

In addition, I would like to briefly introduce that stream processing technology is not only a simple calculation of data, but also a lot of transaction interaction with external systems. Stream processing engines need to move data transactionally between systems with different protocols, and ensure the consistency of computing process and data. This part of the functionality has also been enhanced in Flink 1.7.

Above all, I give you a brief summary of the new features of Flink 1.7. Now let’s take a look at the main part of my speech today, which is building applications and services using Flink. I’ll explain why flow processing is an interesting technology for building applications and services or micro services.

I’ll start with the highly simplified figure on the left, and we’ll talk about some details in a moment. First, let’s look at a simple perspective of understanding and application. As shown on the left, an application can be a Container, a Spring application, a Java application, a Ruby application, and so on. This application receives requests from channels such as RPC and HTTP, and then makes database changes based on the requests. This application may also invoke another microservice for further processing. It’s natural to think that these requests into the application can be seen as a sequence of events, so we can think of them as event streams. Maybe these events are cached in the message queue, and the application consumes these events from the message queue for processing. When the application needs to respond to one request, it outputs the results to another message queue, and the requester can consume the response of the sent request from the message queue. We can see some interesting differences in this picture.

The first difference is that the application and database are no longer separate entities in this graph, but are replaced by a stateful stream processing application. So in the architecture of stream processing applications, there is no connection between applications and databases anymore, they are put together. This approach has both advantages and disadvantages, but some of its benefits are very important. First, the performance benefits are obvious, because applications no longer need to interact with databases, and processing can be based on variables in memory. Secondly, this approach has good and simple consistency.

This map has been simplified a lot. In fact, we usually have many applications, not a isolated application. In many cases, your application will be more in line with this map. There is an interface in the system to receive requests, and then requests are sent to the first application, which may be sent to another application, and then corresponding. Some applications in the diagram consume streams of intermediate results. This diagram has shown why flow processing is a better technology for more complex micro-service scenarios. Because in many cases there will not be a service that directly receives user requests and responds to them, usually a micro service needs to communicate with other micro services. This is just as different applications in stream processing architectures create output streams while creating and outputting new streams based on derived streams.

So far, what we have seen is more or less intuitive. For the micro-service architecture based on flow processing technology, one of the most frequently asked questions is how to ensure transactionality. If the database is used in the system, there will usually be a very mature and complex data validation and transaction model. This is why databases have been very successful over the past many years. Start a transaction, do something with the data, commit or revoke a transaction. This mechanism ensures data integrity (consistency, persistence, etc.).

So how do we do the same thing in stream processing? As an excellent stream processing engine, Flink supports just one semantics, ensuring that every event is processed only once. However, it still has limitations on some operations, which has become an obstacle to the use of stream processing applications. We look at a very simple flow processing application to see what extensions we can make to solve this problem. As we will see, the solution is surprisingly simple.

Let’s take this textbook transaction as an example to see the process of transactional application. The system maintains information about the account and its balance. Such information may be used in scenarios of banks or online payment systems. Suppose we want to deal with something like this: If the balance in Account A is greater than 100, then transfer 50 yuan from Account A to Account B. This is a very simple example of a transfer between two accounts.

The database already has a core paradigm for such transactions, namely atomicity, consistency, isolation and persistence (ACID). This is a few basic guarantees that users can rest assured of using transactions. With them, users don’t have to worry about losing money or other problems in the transfer process. Let’s use this example in streaming applications to enable streaming applications to provide the same ACID support as data:

Atomicity requires that a transfer be completed or not, that is to say, the transfer amount is reduced from one account to another, or the balance of the two accounts remains unchanged. There will not be only one account balance change. Otherwise, money will be reduced or increased.

Consistency and segregation mean that if many users want to transfer money at the same time, they should not interfere with each other, each transfer should be completed independently, and the balance of each account should be correct after completion. That is to say, if two users operate the same account at the same time, the system should not make mistakes.

Persistence refers to the fact that if an operation has been completed, the result of the operation will be properly preserved without loss.

We assume that persistence has been satisfied. A stream processor has a state, which is checkpointed, so the state of the stream processor is recoverable. That is to say, as long as we complete a modification and the modification is checkpointed, the modification is persistent.

Let’s look at three other examples. Imagine what would happen if we implemented such a transfer system with stream processing applications. Let’s simplify the problem first. Assuming that there is no need for conditional transfer, we just transfer 50 yuan from account A to account. That is to say, the balance of account A is reduced by 50 yuan while the balance of account B is increased by 50 yuan. Our system is a distributed parallel system, not a single computer system. For simplicity, we assume that there are only two machines in the system, which can be different physical machines or different containers on YARN or Kubernetes. All in all, they are two different stream processor instances where data is distributed. We assume that account A’s data is maintained by one of the machines and account B’s data is maintained by another machine.

Now we need to make a transfer, transfer 50 yuan from account A to account B. We put this request in the queue, then the transfer request is decomposed into account A and B to operate separately, and the two operations are routed to maintenance account A and maintenance account B according to the key. These two machines are used to maintain account A and maintenance account B. The balance of account A and account B is changed according to the requirement. This is not a transactional operation, but just two independent, meaningless changes. Once we change the transfer request slightly more complex, we will find problems.

Now let’s assume that transfers are conditional. We just want to make transfers when the balance of account A is sufficient. That’s not right. If we still operate as before, send this transfer request to two machines that maintain accounts A and B respectively. If A does not have enough balance, the balance of A will not change, and the balance of B may have been changed. We violated the requirement of consistency.

We see that we need to make a unified decision on whether to change the balance in some way first, and if the balance in this unified decision needs to be modified, we will modify the balance again. So let’s first send a request to the machine that maintains A’s balance and let it see A’s balance. We can do the same thing for B, but in this case we don’t care about the balance of B. Then we aggregate all the requests for such condition checks to check whether the conditions are met. Because a stream processor like Flink supports iteration, if the transfer condition is satisfied, we can put the operation of the balance change into the feedback stream of the iteration to tell the corresponding node to make the balance change. Conversely, if the condition is not satisfied, the operation of balance change will not be put into the feedback flow. In this example, we can transfer money correctly in this way. In a way, we achieve atomicity, and we can make all the balance modifications based on one condition, or without any balance modifications. This part is still relatively intuitive, the greater difficulty is how to achieve the isolation of concurrent requests.

Suppose our system does not change, but there are multiple concurrent requests in the system. As we have known in previous speeches, such concurrency can reach billions of articles per second. As shown in the figure, our system may accept requests from both streams at the same time. If both requests arrive at the same time, we split each request into multiple requests as before, first checking the balance condition, and then proceeding with the balance operation. However, we found that this would cause problems. The machine that manages account A first checks whether the balance of A is greater than 50, and then checks whether the balance of A is greater than 100, because both conditions are met, so both transfers will be carried out, but in fact the balance of account A may not be able to complete two transfers at the same time, but only 50 or 100 yuan can be completed. A sum in a transfer. Here we need to think further about how to handle concurrent requests. We can’t simply handle requests concurrently, which would violate transaction assurance. In a way, this is the core of the whole database transaction. Experts in databases have spent some time offering different solutions, some simple and some complex. But all solutions are not so easy, especially in distributed systems.

How to solve this problem in stream processing? Intuitively, if we can make all transactions happen in sequence, then the problem is solved, which is also a serialization feature. But of course we don’t want all requests to be processed sequentially, which is contrary to our original intention of using distributed systems. So we need to ensure that the final impact of these requests appears to occur sequentially, that is, the impact of a request is based on the impact of the previous request. In other words, changes to a transaction need to be made after all changes to the previous transaction have been completed. This desire for one thing to happen after another seems familiar, which seems to be a problem we have encountered in stream processing before. Yes, it sounds like event time. In a highly simplified way, if all requests are generated at different event times, the stream processors will still process them according to their event times, even if the time they arrive at the processor is out of order for various reasons. Stream processors make the effects of all events appear to occur sequentially. Handling by event time is a feature Flink already supports.

So in detail, how do we solve this consistency problem? Suppose we have parallel requests to enter parallel transaction requests that read records in some tables and then modify records in some tables. The first thing we need to do is to place these transaction requests in chronological order. The transaction time of these requests can not be the same, but the time between them also needs to be close enough, because there will be a certain delay in the process of processing the event time, we need to ensure that the event time processed moves forward. So the first step is to define the order in which transactions are executed, which means that a smart algorithm is needed to set the event time for each transaction.

In the figure, suppose that the event times of these three transactions are T+2, T and T+1, respectively. Then the impact of the second transaction needs to precede the first and third transactions. Different firms make different changes, and each transaction generates different operation requests to modify the state. We now need to sort the events that access each row and state to ensure that their access is in the chronological order of events. It also means that things that are not related to each other naturally have no impact. For example, the third transaction request here does not have access to the same state as the first two transactions, so its event timing is also independent of the first two transactions. When the order of arrival of the current operations between two transactions does not match the event time, Flink will sort them according to their event time and then process them.

Admittedly, there are still some simplifications, and we need to do something to ensure efficient implementation, but in general, this is the whole design. We don’t need anything else besides that.

To implement this design, we introduce a smart distributed event time allocation mechanism. The event time here is logical time, and it does not need to have any practical significance, for example, it does not need to be a real clock. Flink’s ability to deal with disorder is used, and Flink’s function of iterative computation is used to check some preconditions. These are the elements of building a stream processor that supports transactions.

We’ve actually done this, called Streaming Ledger, which is a very small library on Apache Flink. It achieves multi-key transactional operation satisfying ACID based on stream processor. I believe this is a very interesting evolution. Streaming processors are basically guaranteed at first, and then Storm-like systems add at least one guarantee. But obviously at least once is still not good enough. Then we see just one semantics, which is a big step forward, but it’s just one semantics for one-line operations, which is very similar to the key library. Supporting multi-line transactions at exactly one time or multiple lines raises the stream processor to a stage where it can solve the application scenarios of relational databases in the traditional sense.

Streaming Ledger is implemented by allowing users to define and modify some tables.

Streaming Ledger runs these functions and tables, all of which are compiled together into an Apache Flink directed acyclic graph (DAG). Streaming Ledger injects all transaction time allocation logic to ensure consistency of all transactions.

It is not difficult to build such a library, but it is difficult to make it run at high performance. Let’s take a look at its performance. These performance tests were conducted a few months ago, and we haven’t done any special optimization. We just want to see what the simplest methods can do. The actual performance looks quite good. If you look at the step spans formed by these performance bars, performance increases fairly linearly as the number of stream processors increases.

In transaction design, there is no collaboration or lock involved. It’s just stream processing, pushing event streams into the system, caching for a short period of time to do some disorderly processing, and then doing some local status updates. In this scheme, there is no particularly expensive operation. Performance growth seems to exceed linearity in the figure, which I think is mainly due to the working of GC in JAVA’s JVM. With 32 nodes, we can handle about two million transactions per second. In order to compare with the database performance test, when you look at the database performance test, you will see a description similar to the read-write operation ratio, such as 10% update operation. Our test uses 100% update operations, and each write operation updates at least four rows of data on different partitions. Our table is about 200 million rows in size. Even without any optimization, the performance of this scheme is very good.

Another interesting issue in transactional performance is the performance when the updated operational object is a relatively small set. If there is no conflict between transactions, concurrent transaction processing is an easy task. If all transactions are independent and do not interfere with each other, then this is not a problem, any system should be able to solve such a problem well.

When all transactions begin to operate on the same lines, things become more interesting, and you need to isolate different changes to ensure consistency. So we began to compare the performance of a read-only program, a read-write program without writing conflicts, and a read-write program with moderate write conflicts. You can see that the performance is fairly stable. It’s like an optimistic concurrent conflict control, which works well. So if we really want to test the Achilles heel of such systems, that is, to repeatedly update keys in the same small set.

In traditional databases, in this case, there may be repeated retries, repeated failures and retries, which is a bad situation we always want to avoid. Yes, we do have to pay a performance price, which is natural, because if you have several rows of data in your table that everyone wants to update, then your system loses concurrency, which is a problem in itself. But in this case, the system does not crash, it is still stable processing requests, although some concurrency has been lost, but requests can still be processed. This is because we don’t have a conflict retry mechanism. You can think that we have a mechanism based on natural conflict avoidance, which is a very stable and powerful technology.

We have also tried to perform in a cross-geographical situation. For example, we have a Flink cluster in the United States, Brazil, Europe, Japan and Australia. That is to say, we have a globally distributed system. If you use a relational database, you will pay a very high performance price, because the communication delay becomes quite high. Information interaction across continents generates much more delays than information interaction on the same data center or even the same rack.

But interestingly, the flow processing method is not very sensitive to delay. Delay has an impact on performance, but compared with many other schemes, the impact of delayed convection processing is much smaller. So implementing distributed programs in such a globally distributed environment does have worse performance, partly because the communication bandwidth across continents is not as good as that in a unified data center, but the performance is still good.

In fact, you can think of it as a cross-regional database and still be able to get hundreds of thousands of transactions per second on a cluster of about 10 nodes. In this test, we only used 10 nodes, two nodes per continent. So 10 nodes can bring 200,000 transactions per second that are distributed globally. I think this is an interesting result, because this scheme is not sensitive to delay.

I’ve talked a lot about using flow processing to achieve transactional applications. Maybe it sounds like a natural idea, and in some ways it does. But it does need some very complex mechanisms to support it. It needs the capability of continuous processing rather than micro batch processing, the ability to iterate, and the complexity of event-time-based processing disorder. For better performance, it requires flexible state abstraction and asynchronous checkpoint mechanisms. These are really difficult things. These are not implemented by the Ledger Streaming library, but by Apache Flink, so even for such transactional applications, Apache Flink is a real mainstay.

So far, we can say that flow processing not only supports continuous processing, flow analysis, batch processing or event-driven processing, but also can be used for transactional processing. Of course, the premise is that you have a strong enough stream processing engine. This is the whole content of my speech.

Author: apache_flink

Read the original text

This article is the original content of Yunqi Community, which can not be reproduced without permission.