Gateway Rate Limit Network Rate Limitation Scheme

Time:2019-3-22

Gateway Rate Limit Network Rate Limitation Scheme

I. Network Current Limiting Algorithms

In the field of computer, time limit technology is used to control the rate of sending and receiving communication data in network interface. This method is used to optimize performance, reduce latency and improve bandwidth.
    In the field of Internet, this concept is also used to control the rate of network requests. In high concurrency and high traffic scenarios, such as double eleven-second killing, snapping up tickets, snatching orders and so on.
    There are two main current limiting algorithms in network, leaky bucket algorithm and token bucket algorithm. Next, let's introduce to you one by one:

1. Leaky bucket algorithm

Gateway Rate Limit Network Rate Limitation Scheme

Description: The idea of leaky bucket algorithm is very simple. Water (data or request) enters the leaky bucket first, and the leaky bucket outflows at a certain speed. When the water inflows too fast, the leaky bucket algorithm overflows directly. It can be seen that the leaky bucket algorithm can forcibly limit the data transmission rate.

Realization logic: control the rate of data injection into the network, smooth the burst traffic on the network. Leaky bucket algorithm provides a mechanism through which burst traffic can be shaped to provide a stable traffic for the network. Leaky buckets can be seen as a single server queue with constant service time. If leaky buckets (packet caching) overflow, the packet will be discarded.

Advantages and disadvantages: In some cases, leaky bucket algorithm can not effectively use network resources. Because leaky bucket leak rate is a fixed parameter, even if there is no resource conflict (no congestion) in the network, leaky bucket algorithm can not make a single flow burst to the port rate. Therefore, leaky bucket algorithm is inefficient for traffic with burst characteristics. The token bucket algorithm can satisfy these burst traffic. In general, leaky bucket algorithm and token bucket algorithm can be combined to provide greater control of network traffic.

2. Token Bucket Algorithms

Gateway Rate Limit Network Rate Limitation Scheme

Realization logic: The principle of token bucket algorithm is that the system will put tokens into the bucket at a constant speed. If the request needs to be processed, it needs to get a token from the bucket first. When there is no token available in the bucket, it will deny service. Another advantage of the token bucket is that it is easy to change the speed. Once the rate needs to be increased, the rate of tokens placed in buckets is increased as needed. Generally, a certain number of tokens are added to the bucket at a fixed time (e.g. 100 milliseconds). Some variant algorithms calculate the number of tokens that should be added in real time. For example, Huawei’s patent “the method of using token leaky bucket to limit the flow of messages” (CN 1536815 A) provides a method of dynamically calculating the number of tokens available. Compared with other methods of increasing tokens at a fixed time, it only receives one message. Then, the number of tokens injected into the leaky bucket within the time interval between the arrival of the message and the previous message is calculated, and the number of tokens injected into the bucket is calculated to determine whether the number of tokens in the bucket meets the requirements of transmitting the message.

2. Common Rate Limiting Implementation

Generally speaking, speed limits can be divided into three categories:

  • Limit_rate Limits Response Speed
  • Limit_conn Restricts Connection Number
  • Limit_req Restricts the Number of Requests

1. Nginx module (leaky bucket)

  • Reference address: limit_req_module

The ngx_http_limit_req_module (0.7.21) is used to limit the processing speed of requests for each defined key, especially from a single IP address.

1.1 Example Configuration

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

    ...

    server {

        ...

        location /search/ {
            limit_req zone=one burst=5;
        }

1.2 Rules of Use

Syntax: limit_req zone = name [burst = number] [nodelay | delay = number];
Default: -
Scope of action: http, server, location

Parameter description

  • Zone sets the memory name and size.
  • Burst Leak Bucket burst size. When the burst value is greater than the burst value, the request is delayed.
  • The nodelay | delay delay delay parameter (1.15.7) specifies the limit for excessive request latency. The default value is zero, which means that all excessive requests are delayed.

Set the shared memory area and the maximum burst size of the request. If the request rate exceeds the rate configured for the region, the request is delayed to process the request at the defined rate. Excessive requests are delayed until their number exceeds the maximum burst size, in which case the request is terminated by an error. By default, the maximum burst size is equal to zero.

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

server {
    location /search/ {
        limit_req zone=one burst=5;
    }

Description: On average, no more than one request per second and no more than five burst requests are allowed.

—– Instructions for the use of parameters———————————————————————————————-

  1. If you do not want to delay too many requests when requests are restricted, you should use the parameter nodelay:
limit_req zone=one burst=5 nodelay;
  1. There can be several limit_req instructions. For example, the following configuration will limit the processing speed of requests from a single IP address, while limiting the processing speed of requests from a virtual server:
limit_req_zone $binary_remote_addr zone=perip:10m rate=1r/s;
limit_req_zone $server_name zone=perserver:10m rate=10r/s;

server {
    ...
    limit_req zone=perip burst=5 nodelay;
    limit_req zone=perserver burst=10;
}

If and only if there are no limit_req instructions at the current level, these instructions inherit from the previous level.

1.3 Relevant configurations around limit_req_zone


Grammar: limit_req_log_level info | notice | warn | error;
Default: limit_req_log_level error;
Scope of action: http, server, location

This directive appeared in version 0.8.18.

Set the required logging level for the case where the server refuses to process requests because the rate exceeds or delays the processing of requests. Delay logging level is one point lower than reject logging level; for example, if “limit_req_log_level notification” is specified, info level logging delay is used.


Error state

Grammar: limit_req_status code;
Default: limit_req_status 503;
Scope of action: http, server, location

This directive appeared in version 1.3.15.

Set the status code to respond to the rejected request.


Syntax: limit_req_zone key zone = name: size rate = rate [sync];
Default: -
Scope of action: http

Set the parameters of the shared memory area, which will hold the status of various keys. In particular, state stores the current number of excessive requests. Keys can contain text, variables, and their combinations. Requests with an empty key value are not calculated.

Prior to version 1.7.6, a key could contain exactly one variable.
For example:

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

Note: Here, the state is stored in a 10MB region “1”, where the average request processing speed cannot exceed one request per second.


Conclusion:

  1. The client IP address is used as the key. Note that the $binary_remote_addr variable is used instead of the $remote_addr variable. The size of the $binary_remote_addr variable is always 4 bytes for IPv4 addresses and 16 bytes for IPv6 addresses. Storage state always takes 64 bytes on 32-bit platform and 128 bytes on 64-bit platform. A megabyte region can hold about 16,000 64-byte states, or about 8,000 128-byte states.
  2. If regional storage is exhausted, the least recently used state is deleted. Even if a new state cannot be created after that, the request will be terminated by an error.
  3. The rate is specified by the number of requests per second (r/s). If less than one request per second is required, it is specified in the minute request (r/m). For example, the request is 30r/m per second and a half.

2. Openresty module

  • Reference address: lua-resty-limit-traffic
  • Reference address: Openresty common speed limit

2.1 Limit the total number of concurrent interfaces

Limit the number of concurrent connections by IP

lua_shared_dict my_limit_conn_store 100m;
...
location /hello {
   access_by_lua_block {
       local limit_conn = require "resty.limit.conn"
       - Limit the maximum one concurrent request for an IP client
       -- burst is set to 0 and returns 503 directly if the maximum number of concurrent requests is exceeded.
       If you want to allow sudden concurrency here, you can modify the burst value (leaky bucket capacity)
       -- The last parameter is actually how long you want to estimate how long these concurrent (or single requests) will take to process in order to apply the leaky bucket algorithm to the requests in the bucket.
       
       local lim, err = limit_conn.new("my_limit_conn_store", 1, 0, 0.5)              
       if not lim then
           ngx.log(ngx.ERR, "failed to instantiate a resty.limit.conn object: ", err)
           return ngx.exit(500)
       end

       local key = ngx.var.binary_remote_addr
       Commit is true, which means that the value of key in shared dict is updated.
       -- false stands for just looking at the latency of the current request and the number of requests that have not been processed before
       local delay, err = lim:incoming(key, true)
       if not delay then
           if err == "rejected" then
               return ngx.exit(503)
           end
           ngx.log(ngx.ERR, "failed to limit req: ", err)
           return ngx.exit(500)
       end

       If information such as request connection count is added to shared dict, it is recorded in ctx.
       Because the connection is later told to be disconnected to handle other connections
       if lim:is_committed() then
           local ctx = ngx.ctx
           ctx.limit_conn = lim
           ctx.limit_conn_key = key
           ctx.limit_conn_delay = delay
       end

       local conn = err
       In fact, the delay here must be an integral multiple of the concurrent processing time mentioned above.
       For example, 100 concurrencies per second, 200 barrels per second, 500 concurrencies at the same time, and 200 rejections.
       - 100 connections are processed and 200 are temporarily stored in buckets. Of the 200 connections temporarily stored, 0-100 connections should actually be delayed by 0.5 seconds.
       - 101-200 should be delayed by 0.5*2 = 1 second (0.5 is the estimated concurrent processing time above)
       if delay >= 0.001 then
           ngx.sleep(delay)
       end
   }

   log_by_lua_block {
       local ctx = ngx.ctx
       local lim = ctx.limit_conn
       if lim then
           local key = ctx.limit_conn_key
           After the connection is processed, it should be notified that the value in shared dict is updated so that subsequent connections can be accessed for processing.
           Here you can update your previous estimate dynamically, but don't forget to take out limit_conn. new to write.
           Or will it reset every time a request comes in?
           local conn, err = lim:leaving(key, 0.5)
           if not conn then
               ngx.log(ngx.ERR,
                       "failed to record the connection leaving ",
                       "request: ", err)
               return
           end
       end
   }
   proxy_pass http://10.100.157.198:6112;
   proxy_set_header Host $host;
   proxy_redirect off;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_connect_timeout 60;
   proxy_read_timeout 600;
   proxy_send_timeout 600;
}

Explanation: In fact, there is no burst value here, which is simply to limit the maximum concurrent number. If the burst value is set and the delay processing is done, in fact, the leaky bucket algorithm is used for the concurrent number, but if the delay processing is not done, the token bucket algorithm is actually used. Refer to the section below where leaky bucket token buckets are used for requests, and the concurrent leaky bucket token bucket implementation is similar.

2.2 Limit the number of interface time window requests

Limit IP calls to 120 / Hello interfaces per minute (allowing 120 requests to be dropped at the beginning of a time period)

lua_shared_dict my_limit_count_store 100m;
...

init_by_lua_block {
   require "resty.core"
}
....

location /hello {
   access_by_lua_block {
       local limit_count = require "resty.limit.count"

       -- rate: 10/min 
       local lim, err = limit_count.new("my_limit_count_store", 120, 60)
       if not lim then
           ngx.log(ngx.ERR, "failed to instantiate a resty.limit.count object: ", err)
           return ngx.exit(500)
       end

       local key = ngx.var.binary_remote_addr
       local delay, err = lim:incoming(key, true)
       If the number of requests is within the limit, then the delay of the current request being processed (in this scenario, always 0, because either it is processed or rejected) and the remaining number of requests to be processed
       if not delay then
           if err == "rejected" then
               return ngx.exit(503)
           end

           ngx.log(ngx.ERR, "failed to limit count: ", err)
           return ngx.exit(500)
       end
   }

   proxy_pass http://10.100.157.198:6112;
   proxy_set_header Host $host;
   proxy_redirect off;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_connect_timeout 60;
   proxy_read_timeout 600;
   proxy_send_timeout 600;
}

2.3 Smooth Restriction of Interface Request Number

Limit IP calls to 120 / Hello interfaces per minute (smooth processing of requests, i.e., two requests per second)

lua_shared_dict my_limit_req_store 100m;
....

location /hello {
   access_by_lua_block {
       local limit_req = require "resty.limit.req"
       Set rate = 2/s, and the leaky bucket capacity is set to 0. 
       Because the control granularity in resty. limit. req code is at the millisecond level, millisecond-level smoothing can be achieved.
       local lim, err = limit_req.new("my_limit_req_store", 2, 0)
       if not lim then
           ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
           return ngx.exit(500)
       end

       local key = ngx.var.binary_remote_addr
       local delay, err = lim:incoming(key, true)
       if not delay then
           if err == "rejected" then
               return ngx.exit(503)
           end
           ngx.log(ngx.ERR, "failed to limit req: ", err)
           return ngx.exit(500)
       end
   }

   proxy_pass http://10.100.157.198:6112;
   proxy_set_header Host $host;
   proxy_redirect off;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_connect_timeout 60;
   proxy_read_timeout 600;
   proxy_send_timeout 600;
}

2.4 Leaky Bucket Current Limitation

Limit IP calls to only 120 / Hello interfaces per minute (smooth processing requests, i.e. two requests per second), more than half of them enter the bucket and wait (bucket capacity is 60), and if the bucket is full, limit the current.

lua_shared_dict my_limit_req_store 100m;
....

location /hello {
   access_by_lua_block {
       local limit_req = require "resty.limit.req"
       Set rate = 2/s, and the leaky bucket capacity is set to 0. 
       Because the control granularity in resty. limit. req code is at the millisecond level, millisecond-level smoothing can be achieved.
       local lim, err = limit_req.new("my_limit_req_store", 2, 60)
       if not lim then
           ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
           return ngx.exit(500)
       end

       local key = ngx.var.binary_remote_addr
       local delay, err = lim:incoming(key, true)
       if not delay then
           if err == "rejected" then
               return ngx.exit(503)
           end
           ngx.log(ngx.ERR, "failed to limit req: ", err)
           return ngx.exit(500)
       end
       
       This method returns, and the current request takes delay seconds to be processed, as well as the number of previous requests.
       So here, the leaky bucket algorithm is applied to delay the processing of requests in buckets and make them queue for waiting.
       --This is also the main difference from token buckets.
       if delay >= 0.001 then
           ngx.sleep(delay)
       end
   }

   proxy_pass http://10.100.157.198:6112;
   proxy_set_header Host $host;
   proxy_redirect off;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_connect_timeout 60;
   proxy_read_timeout 600;
   proxy_send_timeout 600;
}

3.5 Token Bucket Current Limitation

Limit IP calls to 120 / Hello interfaces per minute (smoothing requests, i.e. passing 2 requests per second), but allow certain burst traffic (burst traffic, i.e. bucket capacity (60 bucket capacity), which is rejected directly over bucket capacity.

lua_shared_dict my_limit_req_store 100m;
....

location /hello {
   access_by_lua_block {
       local limit_req = require "resty.limit.req"

       local lim, err = limit_req.new("my_limit_req_store", 2, 0)
       if not lim then
           ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
           return ngx.exit(500)
       end

       local key = ngx.var.binary_remote_addr
       local delay, err = lim:incoming(key, true)
       if not delay then
           if err == "rejected" then
               return ngx.exit(503)
           end
           ngx.log(ngx.ERR, "failed to limit req: ", err)
           return ngx.exit(500)
       end
       
       This method returns, and the current request takes delay seconds to be processed, as well as the number of previous requests.
       Ignore the delay processing required for requests in buckets and send them directly back to the back-end server.
       In fact, this is the principle of allowing requests in buckets to be used as burst traffic, that is, token buckets.
       if delay >= 0.001 then
       --    ngx.sleep(delay)
       end
   }

   proxy_pass http://10.100.157.198:6112;
   proxy_set_header Host $host;
   proxy_redirect off;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_connect_timeout 60;
   proxy_read_timeout 600;
   proxy_send_timeout 600;
}

Explanation: In fact, the delay and nodelay of ngx_http_limit_req_module in nginx are two schemes similar to whether or not to delay processing requests in buckets here. They are corresponding leaky bucket and token bucket algorithms respectively.


Be careful:
The resty.limit.traffic module illustrates that this library is already usable but still highly experimental.
This means that although this module is currently available, it is still in a highly experimental stage, so at present (2019-03-11) to abandon the use of resty. limit. traffic module.

3. Kong plug-in

  • Reference address: Rate Limiting Advanced (Enterprise Edition)
  • Reference address: request-termination
  • Reference address: rate-limiting request speed limit
  • Reference address: request-size-limiting (official recommendation to open this plug-in to prevent DOS attacks)
  • Reference address: response-ratelimiting response speed limit
  • Reference address: kong-response-size-limiting (unofficial)

3.1 rate-limiting

Rate limits how many HTTP requests a developer can make in a given few seconds, minutes, hours, days, months or years. If the underlying service/routing (or abandoned API entity) does not have an authentication layer, then the client IP address will be used, otherwise, if the authentication plug-in is configured, the user will be used.
  1. Enable the plug-in on a Service
$ curl -X POST http://kong:8001/services/{service}/plugins \
    --data "name=rate-limiting"  \
    --data "config.second=5" \
    --data "config.hour=10000"
  1. Enable the plug-in on a router
$ curl -X POST http://kong:8001/routes/{route_id}/plugins \
    --data "name=rate-limiting"  \
    --data "config.second=5" \
    --data "config.hour=10000"
  1. Start the plug-in on a consumer
$ curl -X POST http://kong:8001/plugins \
    --data "name=rate-limiting" \
    --data "consumer_id={consumer_id}"  \
    --data "config.second=5" \
    --data "config.hour=10000"

Rate-limiting supports three strategies, each with its own strengths and weaknesses

strategy Advantage shortcoming
cluster Accurate, no additional components to support Relatively speaking, the biggest performance impact is that each request forces read and write operations on the underlying data store.
redis Accuracy, less impact on performance than clustering strategy Additional redis installation requirements have greater performance impact than local policies
local Minimum performance impact Not very accurate, unless a consistent hash load balancer is used in front of Kong, it will diverge when expanding the number of nodes

3.2 response-ratelimiting

This plug-in allows you to limit the number of requests a developer can make based on the custom response header returned by the upstream service. You can set any number of speed-limiting objects (or quotas) and instruct Kong to increase or decrease them in any number. Each custom rate limiting object can limit inbound requests per second, minute, hour, day, month or year.
  1. Enable the plug-in on a Service
$ curl -X POST http://kong:8001/services/{service}/plugins \
    --data "name=response-ratelimiting"  \
    --data "config.limits.{limit_name}=" \
    --data "config.limits.{limit_name}.minute=10"
  1. Enable the plug-in on a router
$ curl -X POST http://kong:8001/routes/{route_id}/plugins \
    --data "name=response-ratelimiting"  \
    --data "config.limits.{limit_name}=" \
    --data "config.limits.{limit_name}.minute=10"
  1. Start the plug-in on a consumer
$ curl -X POST http://kong:8001/plugins \
    --data "name=response-ratelimiting" \
    --data "consumer_id={consumer_id}"  \
    --data "config.limits.{limit_name}=" \
    --data "config.limits.{limit_name}.minute=10"
  1. Enable the plug-in on the API
$ curl -X POST http://kong:8001/apis/{api}/plugins \
    --data "name=response-ratelimiting"  \
    --data "config.limits.{limit_name}=" \
    --data "config.limits.{limit_name}.minute=10"

3.3 request-size-limiting

The blocker is larger than an incoming request of a specified size in megabytes.
  1. Enable the plug-in on a Service
$ curl -X POST http://kong:8001/services/{service}/plugins \
    --data "name=request-size-limiting"  \
    --data "config.allowed_payload_size=128"
  1. Enable the plug-in on a router
$ curl -X POST http://kong:8001/routes/{route_id}/plugins \
    --data "name=request-size-limiting"  \
    --data "config.allowed_payload_size=128"
  1. Start the plug-in on a consumer
$ curl -X POST http://kong:8001/plugins \
    --data "name=request-size-limiting" \
    --data "consumer_id={consumer_id}"  \
    --data "config.allowed_payload_size=128"
3.4 request-termination
This plug-in terminates incoming requests using the specified status code and message. This allows for (temporary) suspension of communication on services or routes, or even blocking consumers.
  1. Enable the plug-in on a Service
$ curl -X POST http://kong:8001/services/{service}/plugins \
    --data "name=request-termination"  \
    --data "config.status_code=403" \
    --data "config.message=So long and thanks for all the fish!"
  1. Enable the plug-in on a router
$ curl -X POST http://kong:8001/routes/{route_id}/plugins \
    --data "name=request-termination"  \
    --data "config.status_code=403" \
    --data "config.message=So long and thanks for all the fish!"
  1. Start the plug-in on a consumer
$ curl -X POST http://kong:8001/plugins \
    --data "name=request-termination" \
    --data "consumer_id={consumer_id}"  \
    --data "config.status_code=403" \
    --data "config.message=So long and thanks for all the fish!"

4. Based on redis-INCR key

  • Reference address: pattern-rate-limiter
Using the INCR key of redis, it means adding 1 to the value stored on the key. If the key does not exist, set the value to 0 before the operation. If the key contains a value of the wrong type or a string that cannot be represented as an integer, an error is returned. This operation is limited to 64-bit signed integers.

return value
Integer reply: the value of key after the increment

examples

redis> SET mykey "10"
"OK"
redis> INCR mykey
(integer) 11
redis> GET mykey
"11"
redis> 

INCR key has two uses:

  • Counters, such as total number of articles browsed, distributed data paging, game scores, etc.
  • Rate limiter mode is a special counter used to limit the execution rate of operations, such as restricting the number of requests that can be executed against a common API;

The emphasis of this scheme is to use redis to implement a speed limiter. We use INCR to provide two implementations of this mode. We assume that the problem to be solved is to limit the number of API calls to up to 10 requests per second per IP address:

In the first way, basically every IP has a counter, and every second has a counter.

FUNCTION LIMIT_API_CALL(ip)
ts = CURRENT_UNIX_TIME()
keyname = ip+":"+ts
current = GET(keyname)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"
ELSE
    MULTI
        INCR(keyname,1)
        EXPIRE(keyname,10)
    EXEC
    PERFORM_API_CALL()
END

Advantage:

  1. Using IP + ts, it ensures that the cache per second is a different key, isolating the redisobject generated per second. No expiration time was used to enforce redis expiration.

Disadvantages:

  1. A large number of redis-keys will be generated, although all of them are written in the expiration time, but the cleaning of redis-keys is also a burden. It may affect the read performance of redis.

The second way to create a counter is to start with the first request executed in the current second, and it can only survive for one second. If there are more than 10 requests in the same second, the counter will reach a value greater than 10, otherwise it will expire and start again from 0.

FUNCTION LIMIT_API_CALL(ip):
current = GET(ip)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"
ELSE
    value = INCR(ip)
    IF value == 1 THEN
        EXPIRE(ip,1)
    END
    PERFORM_API_CALL()
END

Advantage:

  1. Compared with the scheme, it takes less space and has higher execution efficiency.

Disadvantages:

  1. The INCR and EXPIRE commands are not atomic operations, and there is a race condition. If for some reason the client executes the INCR command but does not expire, the key will be leaked until we see the same IP address again.

Fix: Convert an INCR with optional expiration to a Lua script sent using the EVAL command (available only in Redis version 2.6).
Using Lua local variables to solve the problem ensures that the expiration time can be set each time.

local current
current = redis.call("incr",KEYS[1])
if tonumber(current) == 1 then
    redis.call("expire",KEYS[1],1)
end

III. Final Realization Program

According to several common implementations and scenarios as well as their advantages and disadvantages, the final adoption is

  • Use Kong plug-in rate-limiting, if not meet the requirements for secondary development.
  • Direct Development of Kong Plug-in Using Token Bucket+redis to Limit Current