Implementation of redis based distributed lock algorithm redlock and redlock hyperf

Time:2021-6-4

preface

Recently, the project needs to encapsulate redis distributed lock in hyperf framework, so it is encapsulated based on redlock algorithmRedLock-HyperfSDK supports not only simple object calling, but also AOP annotation in hyperf framework.
It is not difficult for you to implement a distributed lock based on redis. Most people will use itsetnx + expire + delCommand to implement a simple distributed lock, but is such a mutex really safe.
In this paper, we explore the common implementation methods of redis distributed lock and redis official recommendationRedLcokHow to guarantee the security of lock.

Let’s talk about it a little bit more. At present, in addition to redis, zookeeper and etcd are also used to implement distributed locks. However, in terms of cost, the latter two will bring a certain operation and maintenance cost to a simple distributed lock snatching action. From the ecological point of view, I believe that there is no need to use redis for most projects at present.

Common implementation methods of single instance

Get lock

  • setnx + expire

    1. setnx key value:
      If the key exists, it will return false; if the key does not exist, it will return true.
    2. expire key timeout:
      After obtaining the lock successfully, set the expiration time for the lock.

    The problem with this logic is that the two steps are serial rather than atomic. Whether the redis server or the client fails after obtaining the lock successfully and does not execute expire, then the lock cannot be released

  • set key value EX seconds[NX|XX]
    In order to enhance the atomicity of the above operations, we use the set enhanced command which is supported after redis version 2.6.12. In this way, only one command is needed to complete setnx + expire, which ensures the atomicity
SET key value[EX seconds][PX milliseconds][NX|XX]

Release the lock

When releasing the lock, we need to verify the value of value and then delete it, rather than simply and rudelydelIn this way, any client can unlock. At the same time, in order to ensure the atomicity of check value and delete operation, we use Lua script to operate

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

On the uniqueness of value

Value must be unique. If value is not unique but a fixed value, there may be the following problems:

  1. Client 1 acquire lock successfully
  2. Client 1 has been blocked on an operation for too long
  3. The set key has expired and the lock is released automatically
  4. Client 2 obtains the lock corresponding to the same resource
  5. Client 1 recovers from the block. Because the value is the same, the lock held by client 2 will be released when the lock release operation is performed, which will cause problems

So generally speaking, when acquiring and releasing locks, we need to use UUID or other methods to guarantee the uniqueness of value.

Redlock algorithm

The above commonly used mutex locks are based on a single redis server or a single redis cluster. This algorithm often has zero resistance to redis crash, and the security guarantee of locks is not as high as DLM (distributed lock manager).
There are a lot of ideas about the implementation of DLM based on redis on the Internet. Today’s redlock is the idea put forward by redis author anirez on his blog,http://antirez.com/news/77Later, the community has done a variety of language implementation based on this.

This page is an attempt to provide a more canonical algorithm to implement distributed locks with Redis. We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla single instance approach.

Here we will translate the core of the article to help you understand the redlcok algorithm in depth.

Security and availability assurance

Safety and Liveness guarantees
We are going to model our design with just three properties that, from our point of view, are the minimum guarantees needed to use distributed locks in an effective way.
Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
Liveness property A: Deadlock free. Eventually it is always possible to acquire a lock, even if the client that locked a resource crashes or gets partitioned.
Liveness property B: Fault tolerance. As long as the majority of Redis nodes are up, clients are able to acquire and release locks.

The distributed lock we designed should at least meet the following three requirements:

  1. Security: only one client can hold the lock at any time
  2. Availability A: deadlock release. Even if the client crashes or loses contact in the lock holding phase, other clients should be able to acquire the lock after a period of time.
  3. Availability B: fault tolerance. If most redis nodes survive, the client should be able to acquire and release locks.

Redlock algorithm

In the distributed version, we assume that there are n redis matser nodes. They are completely independent (note that the nodes here can be n redis single master instances or n redis cluster clusters, but they are not cluster clusters with n master nodes).
We already know how to safely acquire and release the lock on a single instance. The client will acquire the lock from n = 5 according to the following steps.

  1. Gets the millisecond timestamp of the current time
  2. Try to obtain the lock from five instances in turn, using the same key and unique value (such as UUID). When requesting to obtain the lock from redis, the client should set a network connection and response timeout, which should be less than the lock failure time. For example, if your lock automatically fails for 10s, the timeout should be between 5 and 50ms, This can prevent the client from waiting for the response result when the server redis is down. If the server does not respond within the specified time, the client should try to get the lock from another redis instance as soon as possible
  3. The client uses the current time minus the start time to acquire the lock (the time recorded in step 1) to get the time to acquire the lock. If and only if the lock is acquired from most (n / 2 + 1, here are three nodes) redis nodes, and the time used is less than the lock failure time, the lock is successful.
  4. If the lock is obtained, the real effective time of the key is equal to the effective time minus the time used to obtain the lock (the result calculated in step 3)
  5. If for some reason, the lock acquisition fails (the lock is not acquired in at least N / 2 + 1 redis instances or the lock acquisition time has exceeded the valid time), the client should unlock all redis instances (even if some redis instances are not locked successfully at all), Prevent some nodes from acquiring locks, but the client does not get a response, resulting in that the lock cannot be acquired again in the next period of time.)

Failed to try again

When the client fails to acquire the lock, it should try again after a random delay time, so that other clients who try to acquire the same lock at the same time may scramble for the lock. (here, if there are three clients competing for the lock, no one may scramble for the lock in the case of 2, 2, 1. At this time, all of your clients will try again without random delay, It’s very likely that they still can’t get it.) The less time a client takes to acquire a lock, the less likely it is to be inserted by other clients. Therefore, the client should send the set command to n instances at the same time.

Release the lock

No matter whether the client thinks that it has obtained the lock or not, it should release the lock to all instances in the end (whether it is released immediately after failure or after successful logical operation).

Safety demonstration

Suppose a client has obtained the lock in most instances, and the key has been set many times in different instances, so their expiration time is different. Suppose that the time of the first successful key is T1 and the time of the last successful key is T2, then the effective time of the first key is min_ VALIDITY=TTL-(T2-T1)-CLOCK_ Shift, the valid time of other keys will be longer than this, so at least during this period, all keys will be set at the same time.
During this period of time, the key on most instances is in the set state, and other clients cannot acquire the lock. Therefore, if a lock is acquired, it will not be acquired again at the same time, because the set NX operation will not succeed on N / 2 + 1 instances.

Usability demonstration

The availability of the system is based on the following three main factors:

  1. Automatic release of lock (based on expiration mechanism): the final lock can be acquired again
  2. The client actively releases the lock
  3. When the client needs to try again, the random delay time is larger than the time to obtain the lock on most instances, which can avoid the occurrence of split brain with high probability

Performance, recovery, and file synchronization

Most users want to have high performance when using redis as a distributed lock service. In order to meet this requirement, we can use multiplexing strategy to send requests to n redis servers to reduce the overall delay.
If our goal is to design a system model that can recover from failure, data persistence also needs to be considered. Suppose that our system does not have data persistence, and client a obtains the lock in 3 / 5 instances. If one of the three instances is restarted, then another client can obtain the same lock again after the restart, which violates our principle of mutual exclusion.
If we turn on AOF, things will be better. We shut down or restart redis, because expiration is implemented semantically, so the expiration time is still valid, and all our needs are met. What if it’s a power failure? If redis is set to do disk synchronization every second, it is likely that our key will be lost. Theoretically, if we want to ensure the security of the lock at any time, we need to turn on fsync = always, but this sacrifices the performance of redis to a certain extent.

Make algorithm more reliable: Lock renewal

If the work of the client is completed by many small steps, we can set the effective time of the lock as small as possible, and then extend the effective time of the lock by sending a Lua script to the server to extend the TTL of the lock.
Only when the lease is successfully renewed on most instances within the effective time can we consider the lock to be renewed successfully. And we should set a maximum number of renewals to avoid that availability a cannot be guaranteed

Design and use of redlock hyperf

RedLock-HyperfIt is based on redlock algorithmHyperf ~2.1.*The compatibility of the two versions is improved. It not only supports simple object calling, but also supports AOP annotation in hyperf framework

Before using, you need to configure the redis connection pool under / config / autoload / redis.php as an independent instance

introduce

composer require zonghay/redlock-hyperf

Common object mode

    try {
        $lock = $this->container->get(RedLock::class)->setRedisPoolName()->setRetryCount(1)->lock('redlock-hyperf-test', 60000);
        if ($lock) {
            //do your code
            $this->container->get(RedLock::class)->unlock($lock);
        }
    } catch (\Throwable $throwable) {
        var_dump($throwable->getMessage());
    }
  • The setredispoolname method is used to specify which redis instances redlock uses as distributed independent nodes. Here, an index array needs to be passed in. The default value is [‘default ‘]. The value of the array should be the connection pool name under / config / autoload / redis.php
    Why use independent redis nodes:
  • The setretrycount method is used to set the number of retries for obtaining locks. The default is 2
  • Setretrydelay is used to delay and try again after a lock acquisition failure. The default value is 200, in milliseconds
  • Lock method to obtain the lock

    • Resource: the key of the lock
    • TTL: Lock expiration time, in milliseconds.
    • return:array|false
  • Unlock method to release the lock

    • Parameter: return after the lock method succeeds
  • If you are worried about the restart or exit of the process in the lock holding phase, it is recommended to add the following code
//Refer to redlockhyperf / aspect / redlockaspect
if ($lock) {
  //to release lock when server receive exit sign
  Coroutine::create(function () use ($lock) {
  $exited = CoordinatorManager::until(Constants::WORKER_EXIT)->yield($lock['validity']);
  $exited && $this->redlock->unlock($lock);
  });
  //do your code
  $this->redlock->unlock($lock);
  return $result;
}

AOP annotation mode

class IndexController extends AbstractController
{
    /**
     * @RedLockAnnotation(resource="redlock-hyperf-test", poolName={"default"})
     */
    public function index() {}
}

The SDK provides redlockhyperf / annotation / redlockannotation annotation, which is used on methods. It can configure resource (required), poolName, poolName, poolName, clockdrivetfactor, TTL and other parameters

last

As for the security of redlock, there was a dispute between Martin kleppmann and antirez, who are experts in distributed systems. There is also a translation of this dispute on the Internet. It is recommended that you can have a look at it. After reading it, you will have a good understanding of the distributed scene.
Is redis based distributed lock secure (Part one)
Is redis based distributed lock secure (Part 2)

reference material