Hello, I’m yes.
Today, let’s talk about current limiting, including common current limiting algorithms, single machine current limiting scenarios, distributed current limiting scenarios and some common current limiting components.
Of course, we must be clear before introducing the current limiting algorithm and specific scenariosWhat is current limiting and why?。
Any technology must find out its source,Technology comes from pain points. Only by clarifying the pain points can we grasp the key and apply the medicine to the case。
What is current limiting?
First, explain what is current limiting?
Flow restriction is very common in daily life. For example, when you go to some scenic spots, the number of tickets sold every day is limited, such as 2000, that is, only 2000 people can go in every day.
Aside: I saw a news before. The scenic spot “Rwanda Volcano Park”, which doesn’t want to sell tickets, sells 32 tickets every day, and each ticket costs 10000 yuan!
Back to the topic, what is the upper limit flow in our project? The limit is “flow”. The definition of “flow” is different in different scenarios. It can be requests per second, transactions per second, network traffic, etc.
And what we usually mean by current limiting isLimit the number of concurrent requests that reach the systemSo that the system can process normallypartUser’s request to ensure the stability of the system.
Flow restriction will inevitably slow down or reject the user’s request, which will affect the user experience. Therefore, current limiting needs to be balanced between user experience and system stability, which is what we often say
By the way, current limiting is also called flow control (flow control).
Why limit current?
We mentioned that current limiting is to ensure the stability of the system.
There are similar problems in daily businessSecond kill activities, promotion of the 11th National Congress of the Communist Party of China or breaking newsIn other scenarios, the traffic of users increases sharply,The processing capacity of back-end services is limited, if you can’t handle the burst traffic well, the back-end service can easily be broken.
Or reptiles, etcAbnormal flow, our exposed services should be based onThe most malicious to guard againstOur caller. We don’t know how the caller will invoke our service. Suppose a caller opens dozens of threads to call your service 24 hours a day. Without doing anything, our service is finished. There are also DDoS attacks.
In addition, for many third-party development platforms, it is not only to prevent abnormal traffic, but also to make fair use of resources. Some interfaces are free for you. Resources can’t be occupied by you all the time, and others have to adjust.
Of course, if you add money, it’s easy to discuss。
The company had made a system before, and the SaaS version had not come out at that time. Therefore, the system needs to be deployed to the customer.
At that time, the boss asked us to give him a current limiting and downgraded version. Not only the scheme of the system is the downgraded scheme, and the core interface can only be called 20 times a day, but also the configuration and number of servers where the system is located, that is, the number of CPU cores of deployed servers, etc., but also the number of all deployed servers to prevent customer cluster deployment, Improve the performance of the system.
Of course, all this needs to be dynamically configured, because it’s easy to negotiate if you add money. The customer never knew.
I think the boss is waiting for the customer to say that the system is a little slow. And get a version 2.0? I asked our R & D department to work overtime to get it out for you.
In summary, the essence of flow restriction is that the back-end processing capacity is limited and requests beyond the processing capacity need to be cut off, or to balance the fair call of clients to server resources and prevent some clients from starving.
Common current limiting algorithms
For the current limiting algorithm, I give the corresponding diagram and relevant pseudo code. Some people like to see the diagram, and some people prefer to see the code.
Counting current limiting
The simplest current limiting algorithm is counting current limiting. For example, the system can process 100 requests at the same time, save a counter, process a request, increase the counter by one, and decrease the counter by one after a request is processed.
Each time a request comes, look at the value of the counter. If it exceeds the threshold, either reject it.
It is very simple and rough. If the value of the counter is stored in memory, it is even a single machine current limiting algorithm. Stored in central storage, such as redis, cluster machine access is a distributed current limiting algorithm.
The advantages are: simple and crude. Atomic classes such as atomic and distributed redis incr can be used in Java for a single machine.
The disadvantage is: suppose the allowable threshold is 10000, and the counter value is 0. When 10000 requests pour in in the first second, the sudden traffic can’t be withstood. Slowly increasing processing and suddenly pouring in are different for programs.
Moreover, the general flow restriction is to limit the access within the specified time interval, so there is another algorithm called fixed window.
Fixed window current limiting
Compared with counting current limiting, it mainly introduces the concept of multiple time windows. The counter is reset every time a time window elapses.
The rules are as follows:
- The number of requests is less than the threshold, access is allowed, and the counter is + 1;
- The number of requests is greater than the threshold, and access is denied;
- After this time window, the counter is cleared;
It looks perfect, but it’s actually flawed.
Fixed window critical problem
It is assumed that the system allows 100 requests per second. Assuming that the first time window is 0-1s, 100 requests will flow in the next time at 0.55s. After 1 second, the count will be cleared. At this time, 100 requests will flow in the next time at 1.05 s.
Although the count in the window did not exceed the threshold, globally, 200 requests poured in within 0.1 second from 0.55s to 1.05s, which is actually unacceptable for a system with a threshold of 100 / s.
In order to solve this problem, sliding window current limiting is introduced.
Sliding window current limiting
Sliding window current limiting solves the problem of fixed window threshold, which can ensure that the threshold will not be exceeded in any time window.
Compared with the fixed window, the sliding window not only needs to introduce a counter, but also needs to record the time point of arrival of each request in the time windowIt will occupy more memory。
The rule is as follows, assuming that the time window is 1 second:
- Record the time of each request
- Count the number of requests in the time window from the time of each request to 1 second forward, and the data before 1 second can be deleted.
- When the counted number of requests is less than the threshold, the time of the request is recorded and allowed to pass. Otherwise, it is rejected.
But both sliding and fixed windowsUnable to solve the assault of centralized traffic in a short time。
The current limiting scenario we want, such as limiting 100 requests per second. We hope to request one every 10ms, so our traffic processing is very smooth, but it is difficult to control the frequency of requests in real scenarios. Therefore, the threshold may be filled within 5ms.
Of course, there are variants for this case, such as setting multiple current limiting rules. Not only limit 100 requests per second, but also set no more than 2 requests per 10ms.
One more word, thisSliding windows can be different from TCP sliding windows。 The sliding window of TCP is that the receiver tells the sender how many “goods” he can receive, and then the sender controls the sending rate.
Next, let’s talk about leaky bucket, which can solve the pain points of time window and make the flow more smooth.
Leaky bucket algorithm
As shown in the figure below, water droplets continue to drip into the leakage bucket and flow out at a constant speed at the bottom. If the rate of water dropping in is greater than the rate of water flowing out, it will overflow when the storage water exceeds the size of the bucket.
The rules are as follows:
- Here comes the request. Put it in the bucket
- The bucket is full and the request is rejected
- The service takes requests from the bucket at a constant speed for processing
You can see that the water drop corresponds to the request. Its characteristic isWide in and strict out, regardless of the number of requests and the rate of requests, they flow out at a fixed rate, corresponding to the service processing requests at a fixed rate. “Let him be strong, Lao Tzu Nick Yang”.
Seeing this, I wonder if it’s a bit like the idea of message queuing, cutting peaks and filling valleys. Generally speaking, leaky buckets are also implemented by queues. Requests that cannot be processed are queued, and requests are rejected when the queue is full. What do you think of when you see this,Thread poolIsn’t that how it was realized?
After the vulnerability filtering, the request can flow out smoothly, which looks very perfect? In fact, its advantages are also its disadvantages.
In the face of sudden requests, the processing speed of the service is the same as usual, which is not what we want. In the face of sudden traffic, we hope that while the system is stable, we can improve the user experience, that is, we can process requests faster, rather than follow the rules like normal traffic (look, the sliding window said that the flow was not smooth enough, but now it’s not smooth. It’s difficult to do it.).
The token bucket can be more “radical” in dealing with assault traffic.
The principle of token bucket is similar to that of leaky bucket, except that leaky bucket isFlow out at a constant speed, and the token bucket isInsert the token into the bucket at a constant speedThen, the request can pass only after getting the token, and then it will be processed by the server.
Of course, the size of the token bucket is limited. Assuming that the tokens in the bucket are full, the tokens generated at a fixed speed will be discarded.
- Put the token into the bucket at a constant speed
- The number of tokens exceeds the bucket limit and is discarded
- When the request comes, first ask for a token from the bucket. If the request is successful, it will be processed. Otherwise, it will be rejected
What do you think of when you see this?Semaphore semaphore, semaphores can control the number of simultaneous accesses to a resource. In fact, it is the same as our idea of taking tokens. One is to take semaphores and the other is to take tokens. But when the semaphore is used up, it will be returned, and we will not return the token, because the token will be refilled regularly.
Let’s look at the pseudo code implementation of token bucket. We can see that the difference between token bucket and leaky bucket lies in addition and subtraction.
It can be seen that when the token bucket is dealing with sudden traffic, if there are 100 tokens in the bucket, the 100 tokens can be taken away immediately, rather than consuming at a uniform speed like the leaky bucket. So inThe token bucket performs better when dealing with sudden traffic。
Summary of current limiting algorithm
The algorithms mentioned above are only the most rough implementation and essential idea of these algorithms, and there are still many variants in engineering.
From the above, it seems that the leaky bucket and token bucket are much better than the time window algorithm. What’s the use of the time window algorithm? Throw it away?
No, although the leaky bucket and token bucket compare the impact of time window on trafficBetter shaping effect, the flow is smoother, but it also has its own disadvantages (some of which have been mentioned above).
Take the token bucket for example. If you don’t warm up, is there no token in the bucket when you go online? Didn’t you just refuse without a token request? This is a mistake. Obviously, the system has no load now.
For another example, the requested access is actually random. Suppose that a token is put into the token bucket every 20ms, and there is no token in the bucket initially. This request happens to have two requests in the first 20ms, and there is no request in the next 20ms. In fact, from the perspective of 40ms, there are only two requests, which should be released, and one request is directly rejected. This may cause many requests to be killed by mistake, but if you look at the monitoring curve, it seems that the flow is very smooth and the peak is well controlled.
Take the leaky bucket for example. The requests in the leaky bucket are temporarily stored in the bucket. In fact, this does not meet the requirements of low latency for Internet services.
So the leaky bucket and token bucket are actually more suitableBlocking current limitingScene, that is, I’ll wait without a token, so I won’t kill by mistake, and the leaky bucket is waiting. It is more suitable for current limiting of background task class. The current limit based on time window is more suitableTime sensitiveIf you can’t ask, please tell me quickly that the waiting flowers are all thanks (pour my aunt a cup of cappuccino. Why did I suddenly say this?).
Single machine current limiting and distributed current limiting
In essence, the difference between single machine current limiting and distributed current limiting lies in the storage location of “threshold”.
The algorithm mentioned above can be implemented directly on a single server, and our services are often deployed in clusters. Therefore, multiple machines need to cooperate to provide current limiting function.
Like the above counter or time window algorithm, the counter can be stored in distributed K-V storage such as TAIR or redis.
For example, the time record of each request for sliding window can use redis
ZREMRANGEBYSCORE Delete data outside the time window, and then
Like token bucket, you can also put the number of tokens into redis.
But this way is equal to every request we need to go
RedisJudge whether it can pass or not. There is a certain loss in performance, so there is an optimization point called “batch”. For example, each token is not taken one at a time, but a batch. If it is not enough, go and take another batch. This can reduce the requests for redis.
But be careful,Batch acquisition will lead to current limiting error within a certain range。 For example, if you take 10 and wait for the next second, the total processing capacity of the cluster machine may exceed the threshold at the same time.
In fact, the optimization point of “batch” is too common. Whether it is batch disk brushing of MySQL, batch sending of Kafka messages or high-performance issuing of distributed ID, it contains the idea of “batch”.
Of course, another idea of distributed current limiting is to divide it equally. Assuming that there were 500 single machines in the past, and now 5 clusters are deployed, let each machine continue to limit the current by 500, that is, make a total current limit at the total entrance, and then each machine can limit the current by itself.
Difficulties of current limiting
It can be seen that each current limit has a threshold, and how to determine this threshold is a difficulty.
The server may not be able to withstand when it is set to be large, but it will be “killed by mistake” when it is set to be small. There is no maximization of resource utilization, which is not good for the user experience.
What I can think of is to estimate the approximate threshold after limiting the flow line, and then do not perform the real limiting operation. Instead, we take a log recording method to analyze the log and see the effect of the current limiting. Then we adjust the threshold and calculate the total processing power of the cluster and the processing power of each sub processor.
Then replay the online traffic, test the real current limiting effect, determine the final threshold, and then go online.
I have also read an article by Uncle mouse, which tells us that it is difficult for us to dynamically adjust the threshold of flow restriction in the case of automatic scaling. Therefore, based on the idea of TCP congestion control, the health status of the server at this time is determined according to the response time P90 or p99 of the request response in a time period to carry out dynamic flow restriction. This algorithm is implemented in his ease gateway product, and interested students can search by themselves.
In fact, the real business scenario is very complex,There are many conditions and resources for current limiting, each resource has different flow restriction requirements. So above me
Strong mouth King。
Current limiting assembly
Generally speaking, we do not need to implement our own current limiting algorithm to achieve the purpose of current limiting. Whether it is access layer current limiting or fine-grained interface current limiting, there are ready-made wheels to use, and its implementation also uses the above-mentioned current limiting algorithm.
Google GuavaCurrent limiting tools provided
RateLimiter, which is based on token bucket, and extends the algorithm to support preheating function.
Alibaba open source current limiting framework
SentinelThe leaky bucket algorithm is used in the uniform queue current limiting strategy.
Current limiting module in nginx
limit_req_zone, the leaky bucket algorithm is adopted, as well as the algorithm in openresty
The specific use is still very simple. Students who are interested can search by themselves. Students who are interested in internal implementation can look at the next source code and learn how to realize the current limit at the production level.
Today, I only briefly explained the contents related to current limiting. There are still many points to be considered in the specific application to the project. Moreover, current limiting is only a link in ensuring the stability of the system, and it also needs to cooperate with relevant contents such as degradation and fusing. I’ll talk about it later.
I’m yes, from a little to a billion. See you next。