What are the flow control schemes in stand-alone and distributed scenarios?

Time:2021-11-25

Introduction:The flow control algorithms required in different scenarios are different, so how to choose the appropriate flow control scheme? This paper shares the ideas and code implementation of several flow control algorithms such as simple window, sliding window, leaky bucket, token bucket and sliding log in single machine and distributed flow control scenarios, and summarizes their complexity and applicable scenarios. It’s longer. Students can collect it and see it later.

What are the flow control schemes in stand-alone and distributed scenarios?

A flow control scenario

The significance of flow control actually needs no more words. In the most common scenarios, flow control is to protect the limited resources downstream from being overwhelmed by traffic and ensure the availability of services. Generally, the threshold of flow control is allowed to be flexible, and occasional excessive access is acceptable.

Sometimes, flow control services are in the charging mode. For example, some cloud factories will charge the frequency of API calls. Since money is involved, calls beyond the threshold are generally not allowed.

In these different scenarios, the applicable flow control algorithms are different. In most cases, sentinel middleware can cope well, but sentinel is not omnipotent. We need to think about other flow control schemes.

II. Interface definition

For convenience, all of the following sample code implementations are based on the throttle interface.

The throttle interface defines a common method for requesting a single quota.

Of course, you can also define a tryacquire (string key, int permissions) signature method to apply for multiple quotas at one time. The implementation idea is the same.

Some flow control algorithms need to maintain a throttle instance for each key.

public interface Throttler {
    /**
     *Try to apply for a quota
     *
     *@ param key key key to apply for quota
     *@ return returns true if the application is successful, otherwise false
     */
    boolean tryAcquire(String key);
}

Three machine flow control

1 simple window

Simple window is my own name. Some places are also called fixed window, which is mainly to distinguish it from the following sliding window.

Flow control is to limit the number of accesses allowed within a specified time interval. Therefore, the most intuitive idea is to maintain a counter to count the number of accesses based on a given time window, and then implement the following rules:

  • If the number of accesses is less than the threshold, it means that access is allowed, and the number of accesses + 1.
  • If the number of accesses exceeds the threshold, access is restricted and the number of accesses does not increase.
  • If the time window is exceeded, the counter is cleared and reset. The first successful access time after clearing is the current time. This ensures that the counter counts the number of visits to the most recent window.

Code implementation simplewindowthrottler

/**
 *Time window in milliseconds
 */
private final long windowInMs;
/**
 *Maximum allowable threshold in time window
 */
private final int threshold;
/**
 *Last successful request time
 */
private long lastReqTime = System.currentTimeMillis();
/**
 *Counter
 */
private long counter;

public boolean tryAcquire(String key) {
    long now = System.currentTimeMillis();
    //If the current time has exceeded the time window from the last access time, reset the counter and take the current time as the starting value of the new window
    if (now - lastReqTime > windowInMs) {       #1
        counter = 0;
        lastReqTime = now;                  #2
    }
    if (counter < threshold) {                  #3
        counter++;                          #4
        return true;
    } else {
        return false;
    }
}

Another common scenario is to perform flow control according to different keys. Each key has a separate time window and threshold configuration. Therefore, it is necessary to maintain a separate current limiter instance for each key.

Switch to a multithreaded environment
In real applications, multiple threads often apply for quotas at the same time. In order to express the algorithm ideas concisely, there is no concurrency synchronization control in the sample code.

Taking the implementation of a simple window as an example, a direct way to convert to a multi-threaded safe flow control algorithm is to set the tryacquire method to synchronized.

Of course, a more efficient way can also be to modify the type of read-write variables:

private volatile long lastReqTime = System.currentTimeMillis();
private LongAdder counter = new LongAdder();

However, this is not really “safe”. Imagine the following scenario: two threads a and B try to obtain quotas. After the judgment conditions of #1 position are met, they will go to the #2 position at the same time and modify the lastreqtime value. The assignment of thread B will overwrite thread a, resulting in the backward offset of the starting point of the time window. Similarly, location #3 and #4 also constitute competitive conditions. Of course, if the accuracy of flow control is not high, this competition is acceptable.

Critical catastrophe problem

The flow control implementation of the simple window is very simple. Take 100 accesses allowed in one minute as an example. If the traffic maintains an even access rate of 200 times / minute, the traffic curve of the system is about this (reset by minute):

What are the flow control schemes in stand-alone and distributed scenarios?

However, if the traffic is not uniform, suppose there are several sporadic visits at 0:00 at the beginning of the time window, and the request starts at the speed of 10 times / s until 0:50, such a traffic graph will appear:

What are the flow control schemes in stand-alone and distributed scenarios?

In the critical 20 seconds (0:50 ~ 1:10), the actual traffic borne by the system is 200 times. In other words, in the worst case, the system will bear twice the traffic impact near the critical point of the window, which is the critical mutation problem that can not be solved by a simple window.

2 sliding window

How to solve the critical mutation problem of simple window algorithm? Since the accuracy of a window is low, the whole large time window can be divided into finer grained sub windows, and each sub window can be counted independently. At the same time, every time the size of a sub window passes, slide a sub window to the right. This is the idea of sliding window algorithm.

What are the flow control schemes in stand-alone and distributed scenarios?

As shown in the above figure, the one minute time window is divided into six sub windows. Each sub window maintains an independent counter to count the visits in 10 seconds. Every 10s, the time window slides one grid to the right.

Back to the example of critical jump in a simple window, combined with the above figure, see how the sliding window eliminates critical jump. If 100 requests come in from 0:50 to 1:00 (corresponding to the gray grid), the next 100 requests from 1:00 to 1:10 will fall into the Yellow grid. Because the algorithm counts the total number of visits of 6 sub windows, when the total exceeds the set threshold of 100, the following 100 requests will be rejected.

Code implementation (refer to sentinel)

Sentinel provides a lightweight and high-performance implementation of sliding window flow control algorithm. You can focus on these categories when looking at the code:

1) The function slot statisticslot is responsible for recording and counting runtime indicator monitoring information at different latitudes, such as RT, QPS, etc.

Sentinel uses the responsibility chain design mode of slot chain internally. Each functional slot has different functions (current limiting, degradation and system protection), which are connected in series through processorslotchain.

Refer to the official Wiki:
https://github.com/alibaba/Sentinel/wiki/SentinelWork flow

2) Statisticslot uses statisticnode#addpassrequest to record the number of requests allowed, including two dimensions: seconds and minutes.

3) The specific record uses the metric interface, which corresponds to the implementation class arraymetric. The real sliding window data structure behind it is leaparray.

4) Leaparray internally maintains the key attributes and structures used by sliding windows, including:

a) Total window size intervalinms, sliding sub window size windowlengthinms, sampling quantity samplecount:

sampleCount = intervalInMs / windowLengthInMs

The current implementation defaults to 2, while the total window size defaults to 1s, which means that the default sliding window size is 500ms. You can adjust the accuracy of statistics by adjusting the number of samples.

b) Array of sliding windows. Each element in the array is represented by windowwrap, which contains:

  • Windowstart: start time of sliding window.
  • Windowlength: the length of the sliding window.
  • Value: the content recorded by the sliding window. The key type is metricbucket, which contains a group of longadders to record different types of data, such as the number of requests passed, the number of requests blocked, the number of requests exceptions, and so on.

The logic of the record request is to get the sliding window according to the current time, and then add the statistical value of the window + 1. But in fact, the step of obtaining the current time window implies many details. The detailed implementation can be found in leaparray #currentwindow. The comments of the source code are very detailed, so I won’t mention it here.

Here, the above process is described with the help of a diagram drawn by other students:

What are the flow control schemes in stand-alone and distributed scenarios?

The above process is based on the source code of version 3.9.21. The implementation of the internal version of sentinel in the previous version is different. A data structure called sentinelrollingnumber is used, but the principle is similar.

Accuracy problem

Now consider this question: can the sliding window algorithm accurately control the number of visits in any given time window t not greater than n?

The answer is no, or the example of dividing one minute into six sub windows with a size of 10 seconds. Assuming that the request rate is now 20 times / second and enters from 0:05, 100 requests will be placed in the time period from 0:05 to 0:10. At the same time, the flow of the next requests will be limited until the window slides at 1:00, and 100 requests will continue to be placed at 1:00 to 1:05. If 0:05 ~ 1:05 is regarded as a 1-minute time window, the actual number of requests in this window is 200, exceeding the given threshold of 100.

If you want to pursue higher accuracy, in theory, you only need to segment the sliding window more finely. For example, in sentinel, you can set the accuracy by modifying the samplecount value of the number of samples per unit time. This value is generally determined according to the needs of the business to achieve a balance between accuracy and memory consumption.

Smoothness problem

When using sliding window algorithm to limit traffic, we often see traffic curves like the following.

What are the flow control schemes in stand-alone and distributed scenarios?

The burst large traffic directly fills the current limit threshold soon after the window starts, resulting in all requests in the remaining window can not pass. When the unit of time window is relatively large (for example, flow control is carried out in minutes), the impact of this problem is relatively large. In practical application, the current limiting effect we want is often not to cut off the flow at once, but to let the flow smoothly enter the system.

3 leaky bucket

Sliding window can not solve the problem of smoothness. Looking back at our demand for smoothness, when the flow exceeds a certain range, the effect we want is not to cut off the flow at once, but to control the flow within a certain speed that the system can bear. Assuming that the average access rate is V, the flow control we need to do is actually flow rate control, that is, control the average access rate V ≤ n / T.

Leaky bucket algorithm is often used to realize traffic shaping in network communication. The idea of leaky bucket algorithm is to control based on flow rate. Imagine the application problem of filling water while pumping water in the pool often done at school. Replace the pool with a bucket (or the one that starts to leak as soon as there is a hole at the bottom). Regard the request as filling water into the bucket. The water leaked at the bottom of the bucket represents the request that leaves the buffer and is processed by the server, and the water overflowed at the bucket mouth represents the discarded request. Conceptual analogy:

  • Maximum allowed requests n: size of bucket
  • Time window size t: the time when a whole bucket of water leaks
  • Maximum access rate V: the speed at which a whole bucket of water leaks, i.e. n / T
  • Request for flow restriction: the speed of water filling in the bucket is faster than that of water leakage, resulting in water overflow in the bucket

Assuming that the bucket is empty at the beginning and a unit volume of water will be injected into the bucket every visit, the water in the bucket will never overflow when we inject water into the bucket at a speed less than or equal to N / T. On the contrary, once the actual water injection speed exceeds the water leakage speed, more and more ponding will be generated in the bucket until it overflows. At the same time, the speed of water leakage is always controlled within n / T, which realizes the purpose of smooth flow.

The access rate curve of leaky bucket algorithm is as follows:
What are the flow control schemes in stand-alone and distributed scenarios?

Attached is an original title map of the common bucket leakage algorithm on the Internet:

What are the flow control schemes in stand-alone and distributed scenarios?

The code implements leakybucketthrottler

/**
 *Water remaining in the current bucket
 */
private long left;
/**
 *Timestamp of last successful injection
 */
private long lastInjectTime = System.currentTimeMillis();
/**
 *Barrel capacity
 */
private long capacity;
/**
 *The time when a bucket of water leaks out
 */
private long duration;
/**
 *The rate at which the bucket leaks, i.e. capacity / duration
 */
private double velocity;

public boolean tryAcquire(String key) {
    long now = System.currentTimeMillis();
    //Current remaining water = previous remaining water - water leakage in the past
    //Water leakage in the past period = (current time - last water injection time) * water leakage rate
    //If the current time is too long from the last water injection time (there has been no water injection), the remaining water in the bucket is 0 (leakage)
    left = Math.max(0, left - (long)((now - lastInjectTime) * velocity));
    //Add a unit of water to the current water volume. As long as there is no overflow, it means that it can be accessed
    if (left + 1 <= capacity) {
        lastInjectTime = now;
        left++;
        return true;
    } else {
        return false;
    }
}

The problem of leaking barrels

The advantage of leaky bucket is that it can smooth the flow. If the flow is not uniform, the leaky bucket algorithm can not achieve real accurate control like the sliding window algorithm. In extreme cases, the leakage bucket will also put a flow equivalent to twice the threshold n in the time window t.

Imagine that if the traffic is much larger than the window size n, it will flow in directly at 0 time at the beginning of the window (0 ~ t), so that the leaky bucket will flow in at time t (0 ≈ T)

Although the access volume can be controlled within n by limiting the bucket size, the side effect of this is that the traffic is prohibited before reaching the limit.

Another implicit constraint is that the water leakage speed of the leaky bucket should preferably be an integer value (that is, the capacity n can divide the time window size t), otherwise there will be some errors in calculating the remaining water volume.

4 token bucket

In the leaky bucket model, when a request comes, it is to inject water into the bucket. If the request for release is reversed, it is changed into pumping water from the bucket. Correspondingly, if the water injection is regarded as supplementing the flow that the system can withstand, the leaky bucket model becomes the token bucket model.

After understanding the missing bucket, it’s easy to look at the token bucket. Copy the principle of the token bucket:

The principle of token bucket algorithm is that the system generates tokens at a constant rate, and then puts the tokens into the token bucket. The token bucket has a capacity. When the token bucket is full, put the tokens into it, and the redundant tokens will be discarded; When you want to process a request, you need to take a token from the token bucket. If there is no token in the token bucket, the request will be rejected.

What are the flow control schemes in stand-alone and distributed scenarios?

The code implements tokenbucketthrottler

The token bucket is essentially the same as the leaky bucket, so the leaky bucket can be changed into a token bucket with a slight change in the code.

long now = System.currentTimeMillis();
left = Math.min(capacity, left + (long)((now - lastInjectTime) * velocity));
if (left - 1 > 0) {
    lastInjectTime = now;
    left--;
    return true;
} else {
    return false;
}

If the token bucket is used in the production environment, you can consider using the ratelimiter provided in guava. Its implementation is multithread safe. When calling ratelimiter #acquire, if the remaining tokens are insufficient, the thread will be blocked for a period of time until there are enough available tokens (instead of rejecting them directly, which is very useful in some scenarios). In addition to the default smoothburst policy, ratelimiter also provides a policy called smoothwarmingup, which supports setting a warm-up period. During the warm-up period, ratelimiter will smoothly increase the rate of releasing tokens to the maximum rate. The purpose of this design is to meet the situation that the resource provider needs warm-up time, rather than providing stable rate services every access (for example, with cache services, the cache needs to be refreshed regularly). One disadvantage of ratelimiter is that it only supports QPS level.

Difference between leaky bucket and token bucket

Although the two are essentially reversed, the applicable scenarios are slightly different in practical use:

1) Leaky bucket: used to control the rate in the network. In this algorithm, the input rate can vary, but the output rate remains constant. It is often used with a FIFO queue.

Imagine that the hole in the leaky bucket is of fixed size, so the rate of leakage can be kept constant.

2) Token bucket: add tokens to the bucket at a fixed rate, allowing the output rate to vary according to the burst size.

For example, a system limits the maximum number of visits in 60 seconds to 60 times, and the conversion rate is 1 time / s. if there is no access in a period of time, it is empty for the leaky bucket at the moment. Now, 60 requests flow in an instant. After traffic shaping, the leaky bucket will leak 60 requests to the downstream in 1 minute at the rate of 1 request per second. If the token bucket is replaced, 60 tokens are taken from the token bucket at one time and stuffed downstream at once.

5 slip log

In general, the above algorithms can be well used in most practical application scenarios, and few scenarios need real complete and accurate control (that is, the amount of requests in any given time window t is not greater than n). For accurate control, we need to record each user request log. When each flow control judgment is made, we take out the number of logs in the latest time window to see whether it is greater than the flow control threshold. This is the algorithm idea of sliding log.

Suppose there is a request at a certain time T. to judge whether it is allowed, we need to see whether there are more than or equal to n requests released in the past T – N time period. Therefore, as long as the system maintains a queue Q and records the time of each request, the number of requests from T – N time can be calculated theoretically.

Considering that only the records in the longest t time before the current time need to be concerned, the length of queue Q can change dynamically, and only n accesses are recorded in the queue at most, so the maximum value of queue length is n.

The sliding log is very similar to the sliding window. The difference is that the sliding log slides dynamically according to the time recorded in the log, while the sliding window slides in the sub window dimension according to the size of the sub window.

Pseudo code implementation

The pseudo code of the algorithm is shown as follows:

#Initialization
counter = 0
q = []

#Request processing flow
#1. Find the request with the first timestamp > = T-T in the queue, that is, the earliest request in the time window t ending at the current time t
t = now
start = findWindowStart(q, t)

#2. Truncate the queue and only keep the records and count values in the latest t time window
q = q[start, q.length - 1] 
counter -= start

#3. Judge whether to release. If release is allowed, add the request to the end of the queue Q
if counter < threshold
    push(q, t)
    counter++
    #Release
else
    #Current limiting

The implementation of findwindowstart depends on the data structure used by the queue Q. Taking a simple array as an example, binary search can be used. You will also see how to use other data structures later.

If the array is used, a difficulty may be how to truncate a queue. A feasible idea is to use a set of head and tail pointers to point to the nearest and earliest effective record indexes in the array respectively. The implementation of findwindowstart becomes to find the corresponding elements between tail and head.

Complexity problem

Although the algorithm solves the problem of accuracy, the cost is obvious.

First, we need to save a queue with a maximum length of N, which means that the space complexity reaches o (n). If we want to do flow control for different keys, it will occupy more space. Of course, the queues of inactive keys can be reused to reduce memory consumption.

Secondly, we need to determine the time window in the queue, that is, find the request record no earlier than the current timestamp T – N through the findwindowstart method. Taking binary search as an example, the time complexity is O (logn).

IV. distributed flow control

In reality, application services are often deployed distributed. If the shared resources (such as databases) or dependent downstream services have traffic restrictions, distributed flow control will come in handy.

Although the flow control quota can be evenly allocated to each application server to convert the problem to single-machine flow control, this method has poor effect in case of uneven traffic, machine downtime, temporary capacity expansion and contraction, etc.

The core algorithm idea of flow control in distributed environment is actually the same as that of single machine flow control. The difference is that a synchronization mechanism needs to be implemented to ensure global quota. The synchronization mechanism can be realized in two ways: centralization and Decentralization:

1) Centralization: the quota is uniformly controlled by a central system, and the application process obtains the flow control quota by applying to the central system.

  • The consistency of state is maintained in the central system, and the implementation is simple.
  • The unavailability of the central system node will lead to flow control error, which requires additional protection. For example, centralized flow control often degenerates into single machine flow control when central storage is unavailable.

2) Decentralization: the application process independently saves and maintains the flow control quota status, and periodically communicates asynchronously in the cluster to keep the status consistent.

  • Compared with the centralized scheme, the decentralized scheme can reduce the impact of centralized single point reliability, but the implementation is complex and the consistency of state is difficult to ensure.
  • In cap, decentralization is more inclined to a and centralization is more inclined to C.

The decentralized scheme has not been seen in the production environment, so only the idea of centralized flow control is discussed below.

1 access layer inlet flow control

In the network architecture of application access, there is often a layer of LVS or nginx as the unified entrance in front of the application server, which can be used as the flow control of the entrance. In essence, this is the scene of single machine flow control.

Taking nginx as an example, nginx provides NGX_ http_ limit_ req_ Module module is used for flow control, and the bottom layer uses leaky bucket algorithm.

An example of nginx flow control configuration is as follows, which means that each IP address can only request the / login / interface 10 times per second.

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
    location /login/ {
        limit_req zone=mylimit;

        proxy_pass http://my_upstream;
    }
}

The flow control instruction of nginx also supports more configurations, such as configuring limit_ Add burst and nodelay parameters to the req instruction to allow a certain degree of burst, or combine the geo and map instructions to realize black-and-white list flow control. For details, please refer to the official document of nginx:
Rate Limiting with NGINX and NGINX Plus(https://www.nginx.com/blog/rate-limiting-nginx/)。

If the built-in modules cannot meet the requirements, please use the customized Lua module. Refer to the Lua current limiting module Lua rest limit traffic provided by openresty.

2 tokenserver flow control

The tokenserver name in sentinel is borrowed here. For the introduction of sentinel cluster flow control, please refer to the official document: sentinel cluster flow control(https://github.com/alibaba/Sentinel/wiki/Cluster flow control).

The idea of this kind of flow control is to find a tokenserver to specifically control the flow control quota, including counting the total amount of calls, judging whether a single request is allowed, etc. the application server communicates with tokenserver as a client to obtain the quota. Because the logic of flow control is handled uniformly in tokenserver, the algorithm discussed in single machine flow control is also applicable.

It is natural to think that this kind of flow control is very dependent on the performance and availability of tokenserver.

In terms of performance, a single point tokenserver can easily become a bottleneck. Check the sentinel source code, in which netty is used for network communication, the data packet adopts a custom format, and there are few other performance optimizations.

In terms of availability, as stated in Sentinel’s official document, if tokenserver cluster current limiting is used in the production environment, the following problems must be solved:

Token server automatic management and scheduling (assigning / electing token server)

Token server is highly available. When a server is unavailable, it will automatically fail over to other machines

At present, Sentinel’s tokenserver does not realize these capabilities by default. It needs to customize or add other systems to realize them. For example, using a distributed consistency protocol for cluster election, or using a group of monitors to monitor the status, the implementation cost is still very high.

3 storage flow control

The idea of stored flow control is to save the count value and other statistical information of flow control through a storage system, the application obtains the statistical information from the storage, and then writes the latest request information into the storage. The storage system can choose an existing MySQL database or redis cache. Generally, there are many caches based on performance. Here, take TAIR and redis as examples.

TAIR flow control

Relatively simple, directly on the code implementation.

public boolean tryAcquire(String key) {
  //Build the key of the tail in seconds
  String wrappedKey = wrapKey(key);
  //Every request is + 1, the initial value is 0, and the validity period of the key is set to 5S
  Result<Integer> result = tairManager.incr(NAMESPACE, wrappedKey, 1, 0, 5);
  return result.isSuccess() && result.getValue() <= threshold;
}

private String wrapKey(String key) {
  long sec = System.currentTimeMillis() / 1000L;
  return key + ":" + sec;
}

Does it feel a little too simple? Thanks to TAIR’s high performance, this method can well support large traffic.

This TAIR flow control scheme actually uses the idea of a simple window. Each key performs QPS control with one time window per second (QPM / QPD principle is similar). The key lies in the use of TAIR’s API:

incr

Result incr(int namespace, Serializable key, int value, int defaultValue, int expireTime)
describe
Increase the count. Note: do not put before incr!!
parameter
Namespace – the namespace assigned at the time of request
Key – key list, no more than 1K
Value – increment
DefaultValue – the initial count value of the key when calling incr for the first time. The first returned value is DefaultValue + value.
ExpireTime – Data expiration time, in seconds. Relative time or absolute time (UNIX timestamp) can be set. ExpireTime = 0, indicating that the data will never expire. ExpireTime > 0 indicates that the expiration time is set. If expireTime > timestamp of current time, it means absolute time is used; otherwise, relative time is used. ExpireTime < 0 indicates that the expiration time is not concerned. If the expiration time has been set before, the previous expiration time will prevail. If not, it will be treated as never expiring, but the current MDB will be treated as never expiring.
Return value
Result object, the return value can be negative. When the key does not exist, it returns DefaultValue + value for the first time. Subsequent incrs increase value based on this value.

Of course, this method also has disadvantages:

  • Critical mutation problem of simple window.
  • The reliability of TAIR requires a degradation scheme. As mentioned above, centralized flow control generally needs to be matched with degraded single machine flow control.
  • Time synchronization of cluster machines. Since the local time of the cluster machine will be used to generate the key, the machine time must be consistent.

For example, if the time of different machines is slightly different by 10ms, the statistics at the interval points of the time window will produce relatively large errors. For example, at the same time, the time of one machine is 0.990 and the time of the other is 1.000. The operation keys of the two machines are different when calling incr, and the accuracy will naturally be affected.

Redis flow control

Redis supports rich data structures and good performance. Its “single process” model is convenient for synchronous control, so it is very suitable for distributed flow control storage.

1) Simple window implementation

The idea of using redis to realize simple window flow control is the same as using TAIR. Redis also provides incr commands for counting, and redis’s “single process” model also provides good concurrency protection. The official document of redis describes how to use incr to implement rate limiter. I’ll translate it here:

Redis INCR key(https://redis.io/commands/incr))

Taking a simple window as an example, the simplest and direct implementation is as follows:

FUNCTION LIMIT_API_CALL(ip)
ts = CURRENT_UNIX_TIME()
keyname = ip+":"+ts
current = GET(keyname)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"
ELSE
    MULTI
        INCR(keyname,1)
        EXPIRE(keyname,10)
    EXEC
    PERFORM_API_CALL()
END

Similar to the above TAIR in implementation, a counter is maintained for each key in seconds. The difference is that redis does not provide an atomic incr + Express instruction, so it is necessary to call express again after incr to set the validity period of the key. At the same time, the outer layer is wrapped with multi and exec to ensure transaction.

If you don’t want to call express every time, consider the second method:

FUNCTION LIMIT_API_CALL(ip):
current = GET(ip)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"
ELSE
    value = INCR(ip)
    IF value == 1 THEN
        EXPIRE(ip,1)
    END
    PERFORM_API_CALL()
END

The validity period of the counter is set to 1s at the first incr, so no additional processing is required for the key.

However, it should be noted that there is a hidden competitive condition in this way. If the client does not call exhibit due to application crash or other reasons after calling incr for the first time, the counter will always exist.

To solve this problem in mode 2, you can use Lua script:

local current
current = redis.call("incr",KEYS[1])
if tonumber(current) == 1 then
    redis.call("expire",KEYS[1],1)
end

The third way is through redis’s list structure. More complex, but you can record each request.

FUNCTION LIMIT_API_CALL(ip)
current = LLEN(ip)
IF current > 10 THEN
    ERROR "too many requests per second"
ELSE
    IF EXISTS(ip) == FALSE              #1
        MULTI
            RPUSH(ip,ip)
            EXPIRE(ip,1)
        EXEC
    ELSE
        RPUSHX(ip,ip)
    END
    PERFORM_API_CALL()
END

There is also an implicit contention condition. When executing the exist judgment line (#1 position), the exist commands of both clients may return false. Therefore, the commands in the multi / exec block will be executed twice. However, this rarely happens and will not affect the accuracy of the counter.

The above methods can be further optimized, because the incr and rpush commands will return the counter value after operation, so the set then get method can be used to obtain the counter value.

Transforming a simple window into a sliding window is a similar idea. Replace a single key with a hash structure. In the hash, a count value is saved for each sub window. During statistics, the count values of all sub windows in the same hash can be added.

2) Token bucket / leaky bucket implementation

It is also very simple to implement token bucket or leaky bucket with redis. Take the token bucket as an example. In implementation, two keys can be used to store the number of available tokens and the last request time of each user respectively. Another possible better way is to use the hash data structure of redis.

The example below is a user_ 1. Flow control quota data currently saved in redis: there are currently two tokens left in the token bucket, and the timestamp of the last access is 1490868000.

What are the flow control schemes in stand-alone and distributed scenarios?

When a new request is received, the operation to be performed by redis client is the same as that seen in the single machine flow control algorithm. First, obtain the current quota data (hgetall) from the corresponding hash, and calculate the number of tokens to be filled according to the current timestamp, the timestamp of the last request and the token filling speed; Then, judge whether to release and update the new timestamp and token number (hmset).

An example is as follows:

What are the flow control schemes in stand-alone and distributed scenarios?

Similarly, if higher accuracy is required, concurrency control must be done for the operation of the client.

Example of possible problems caused by not doing synchronization control: there is only one token in the bucket. When two clients request at the same time, there is a concurrency conflict. As a result, the requests will be released.

What are the flow control schemes in stand-alone and distributed scenarios?

The Lua code example is as follows:

local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]

local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local fill_time = capacity/rate
local ttl = math.floor(fill_time*2)

local last_tokens = tonumber(redis.call("get", tokens_key))
if last_tokens == nil then
  last_tokens = capacity
end

local last_refreshed = tonumber(redis.call("get", timestamp_key))
if last_refreshed == nil then
  last_refreshed = 0
end

local delta = math.max(0, now-last_refreshed)
local filled_tokens = math.min(capacity, last_tokens+(delta*rate))
local allowed = filled_tokens >= requested
local new_tokens = filled_tokens
if allowed then
  new_tokens = filled_tokens - requested
end

redis.call("setex", tokens_key, ttl, new_tokens)
redis.call("setex", timestamp_key, ttl, now)

return { allowed, new_tokens }

3) Sliding log implementation

Thanks to redis’s sorted set structure, it is extremely simple to implement sliding logs. The process is roughly as follows:

a) Each user has a corresponding sorted set to record the request log.

  • The key and value of each element can be the same, that is, the timestamp of the request.
  • The sorted set can set the validity period according to the size of the time window. For example, when the time window is 1s, set the expiration time of 5S. When the amount of requests is small, it can save the redis server memory.

b) When a new user request is received, first delete the expired elements in the sorted set through the zremrangebyscore command. The expired elements here are:

Request timestamp T < current timestamp now – time window size interval

c) Use zadd to add the current request to the set.

d) Use zcount to obtain the current remaining set size and judge whether flow control is required.

long now = System.currentTimeMillis();
long maxScoreMs = now - windowInSecond * 1000;

Transaction redis = jedisPool.getResource().multi();
redis.zremrangeByScore(key, 0, maxScoreMs);
redis.zadd(key, now, now + "-" + Math.random()); //  Add a random value so that the member does not repeat
redis.expire(key, windowInSecond);
redis.exec();

Another code example of JS implementation:
https://github.com/peterkhayes/rolling-rate-limiter/blob/master/index.js

Since the space complexity of the sliding log algorithm is higher than that of other algorithms, when using the sliding log algorithm, pay attention to monitoring the redis memory usage.

4) Concurrency control

The above algorithms have mentioned the race conditions that may be brought about by not doing concurrency control, but additional concurrency control will inevitably lead to performance degradation, which usually requires a trade-off between accuracy and performance. There are several common types of concurrency control for redis flow control:

  • Use redis transaction multi / exec.
  • Using redlock(https://redis.io/topics/distlock)Equally distributed locks require each client to obtain the distributed lock of the corresponding key before operation.
  • Lua script.

It is best to decide which method to use through performance testing.

4 Some Thoughts on expansion

Distributed flow control brings network communication, lock synchronization and other overhead, which will have a certain impact on the performance. At the same time, the reliability of distributed environment will also bring more challenges. How to design a distributed flow control system with high performance and high reliability? This may be a big topic involving all aspects of the whole system.

Share some personal thoughts and welcome to discuss:

1) According to the actual demands, it is a good way to reasonably match the multi-level flow control of different layers, and try to block the flow in the outer layer. For example, the common interface layer nginx flow control + application layer flow control.

2) Select an appropriate cache system to save the dynamic data of flow control, which generally follows the company’s unified technical architecture.

3) Put the static configuration of flow control into the configuration center (such as diamond).

4) When designing, consider the unavailability of distributed flow control (such as cache hanging up), switch to single machine flow control if necessary, and use sentinel to be mature and reliable.

5) In many cases, the requirements for accuracy are not so high, because a certain amount of bursts is generally allowed. At this time, you can do some performance optimization. The biggest bottleneck of performance is that each request accesses the cache once. I used a compromise method in my design:

  • A part of the available quota is pre allocated to the machines in the cluster according to a certain proportion (e.g. 50%). It is generally distributed evenly. If the flow weight of each machine is known in advance, it can be weighted. The quota consumption rate of each machine is different. There may be machine downtime and capacity expansion and contraction in the middle. Therefore, the pre allocation proportion should not be too large or too small.
  • When each machine runs out of quota, it requests quota from the central system. An optimization point here is that each machine will record its quota consumption rate (equivalent to the traffic rate it bears), and apply for quotas of different sizes according to the rate. If the consumption rate is large, apply for more at one time.
  • When the overall available quota is less than a certain proportion (e.g. 10%), limit the number of quotas that can be applied for by each machine at one time, calculate the distribution quota according to the remaining window size, and the distribution amount each time does not exceed a certain proportion of the remaining quota (e.g. 50%), so that the remaining traffic can transition smoothly.

V. summary

The algorithm of distributed flow control is actually an extension of single machine flow control, and the essence of the algorithm is the same. According to my personal understanding, the complexity and applicable scenarios of the above flow control algorithms are summarized here.
What are the flow control schemes in stand-alone and distributed scenarios?

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]