Interviewer: do you really know redis distributed lock?

Time:2021-4-18

What is distributed lock

When it comes to redis, our first thought function is to cache data. In addition, redis is often used for distributed locking because of its single process and high performance.

As we all know, locks are used as synchronization tools in programs to ensure that shared resources can only be accessed by one thread at a time. We are familiar with locks in Java, such as synchronized We often use Java lock and lock, but the Java lock can only guarantee the effectiveness of a single machine, and the distributed cluster environment can’t do anything about it. At this time, we need to use distributed lock.

Distributed lock, as the name suggests, is the lock used in distributed project development, which can be used to control the synchronous access of shared resources between distributed systems. Generally speaking, distributed lock needs to meet the following characteristics:

1. Mutual exclusion: at any time, for the same data, only one application can obtain the distributed lock;

2. High availability: in the distributed scenario, a small number of servers will not be affected by downtime. In this case, the services providing distributed locks need to be deployed in the form of clusters;

3. Prevent lock timeout: if the client does not release the lock actively, the server will release the lock automatically after a period of time to prevent the deadlock when the client is down or the network is unreachable;

4. Exclusivity: lock and unlock must be performed by the same server, that is, the lock holder can release the lock. The lock you added cannot appear, and someone else will unlock it for you;

There are many tools in the industry that can achieve the effect of distributed locking, but the operations are nothing more than locking, unlocking, and preventing lock timeout.

Since this article is about redis distributed lock, we naturally extend it with redis knowledge.

Commands to implement locks

Let’s first introduce a few commands of redis,

1. Setnx, usage isSETNX key value

Setnx is the abbreviation of “set if not exists”. If the setting is successful, 1 will be returned, otherwise 0 will be returned.

Interviewer: do you really know redis distributed lock?

Setnx usage

It can be seen that when thekeybylockAfter setting the value of “Java” to “Java”, setting it to another value will fail. It looks very simple and seems to monopolize the lock, but there is a fatal problem, which is thatkeyThere is no expiration time. In this way, unless you manually delete the key or set the expiration time after obtaining the lock, other threads will never get the lock.

In this case, we can always add an expiration time to the key. Let the thread directly perform two steps when obtaining the lock:

`SETNX Key 1`
`EXPIRE Key Seconds`

There are also problems with this scheme, because obtaining locks and setting expiration time are divided into two steps, which are not atomic operations and may cause problemsAcquire lock succeeded but set time failedThat would be a waste.

But don’t worry. Redis officials have already considered this kind of thing for us, so we have the following command

2. Setex, usageSETEX key seconds value

Set value tovalueLinked tokey, and willkeyThe survival time of is set toseconds(in seconds). IfkeyThe setex command will override the old value.

This command is similar to the following two commands:

`SET key value`
`Expire key seconds # set lifetime`

These two actions are atomic and will be completed at the same time.

Interviewer: do you really know redis distributed lock?

Setex usage

3. Psetex, usagePSETEX key milliseconds value

This command is similar to the setex command, but it is set in millisecondskeyTime to live, not in seconds, as the setex command does.

However, starting from redis 2.6.12, the set command can achieve the same effect as setnx, setex and psetex through parameters.

Like this command

`SET key value NX EX seconds` 

After adding NX and ex parameters, the effect is equivalent to setex, which is also the most common method of redis getting lock.

How to release the lock

The command to release the lock is simple. Just delete the key. But as we said earlier, because the distributed lock must be released by the lock holder himself, we must first make sure that the thread that releases the lock is the lock holder, and then delete it when there is no problem. In this way, it becomes two steps, which seems to violate atomicity. What should we do?

Don’t panic, we can use Lua script to assemble the two steps, just like this:

`if redis.call("get",KEYS[1]) == ARGV[1]`
`then`
 `return redis.call("del",KEYS[1])`
`else`
 `return 0`
`end`

Keys [1] is the name of the current key, and argv [1] can be the ID of the current thread (or other unfixed values, which can identify the thread to which it belongs), so as to prevent threads holding expired locks or other threads from deleting existing locks by mistake.

code implementation

After knowing the principle, we can write code to realize the function of redis distributed lock, because the purpose of this paper is mainly to explain the principle, not to teach you how to write distributed lock, so I use pseudo code to realize it.

The first is the tool class of redis lock, which contains the basic methods of locking and unlocking

`public class RedisLockUtil {`
 `private String LOCK_KEY = "redis_lock";`
 `//Key holding time, 5ms`
 `private long EXPIRE_TIME = 5;`
 `//Waiting timeout, 1s`
 `private long TIME_OUT = 1000;`
 `//Redis command parameters, equivalent to NX and PX command set`
 `private SetParams params = SetParams.setParams().nx().px(EXPIRE_TIME);`
 `//Redis connection pool is used to connect local redis clients`
 `JedisPool jedisPool = new JedisPool("127.0.0.1", 6379);`
 `/**`
 `*Lock`
 `*`
 `* @param id`
 `*The ID of the thread, or other fields that recognize the current thread and do not repeat`
 `* @return`
 `*/`
 `public boolean lock(String id) {`
 `Long start = System.currentTimeMillis();`
 `Jedis jedis = jedisPool.getResource();`
 `try {`
 `for (;;) {`
 `//If the set command returns OK, the lock acquisition is successful`
 `String lock = jedis.set(LOCK_KEY, id, params);`
 `if ("OK".equals(lock)) {`
 `return true;`
 `}`
 `//Otherwise, the cycle will wait, and at time_ If the lock is not acquired within the out time, the acquisition fails`
 `long l = System.currentTimeMillis() - start;`
 `if (l >= TIME_OUT) {`
 `return false;`
 `}`
 `try {`
 `//Sleep for a while, or repeat the loop will always fail`
 `Thread.sleep(100);`
 `} catch (InterruptedException e) {`
 `e.printStackTrace();`
 `}`
 `}`
 `} finally {`
 `jedis.close();`
 `}`
 `}`
 `/**`
 `*Unlocking`
 `*`
 `* @param id`
 `*The ID of the thread, or other fields that recognize the current thread and do not repeat`
 `* @return`
 `*/`
 `public boolean unlock(String id) {`
 `Jedis jedis = jedisPool.getResource();`
 `//Delete Lua script of key`
 `String script = "if redis.call('get',KEYS[1]) == ARGV[1] then" + "   return redis.call('del',KEYS[1]) " + "else"`
 `+ "   return 0 " + "end";`
 `try {`
 `String result =`
 `jedis.eval(script, Collections.singletonList(LOCK_KEY), Collections.singletonList(id)).toString();`
 `return "1".equals(result);`
 `} finally {`
 `jedis.close();`
 `}`
 `}`
`}`

The specific code function comments have been written clearly, and then we can write a demo class to test the effect:

`public class RedisLockTest {`
 `private static RedisLockUtil demo = new RedisLockUtil();`
 `private static Integer NUM = 101;`
 `public static void main(String[] args) {`
 `for (int i = 0; i < 100; i++) {`
 `new Thread(() -> {`
 `String id = Thread.currentThread().getId() + "";`
 `boolean isLock = demo.lock(id);`
 `try {`
 `//If you get the lock, subtract one from the shared parameter`
 `if (isLock) {`
 `NUM--;`
 `System.out.println(NUM);`
 `}`
 `} finally {`
 `//The release lock must be placed finally`
 `demo.unlock(id);`
 `}`
 `}).start();`
 `}`
 `}`
`}`

We create 100 threads to simulate concurrency. After execution, the result is as follows:

Interviewer: do you really know redis distributed lock?

Code execution results

It can be seen that the effect of lock is achieved, and thread safety can be guaranteed.

Of course, the above code is only a simple implementation of the effect, the function is certainly incomplete, a sound distributed lock to consider many aspects, the actual design is not so easy.

Our purpose is just to learn and understand the principles. It’s unrealistic and unnecessary to write an industrial distributed lock tool. There are a lot of similar open source tools (redisson). The principles are almost the same, and they have already been tested by peers in the industry. Just use them directly.

Although the function is realized, but in fact, from the design point of view, such a distributed lock has great defects, which is also the content of this article.

Defects of distributed lock

1、 Lock invalidation caused by long time blocking of client

Client 1 gets the lock, which is blocked for a long time due to network problems or GC and other reasons, and then the lock expires before the business program finishes executing. At this time, client 2 can also get the lock normally, which may lead to thread safety problems.

Interviewer: do you really know redis distributed lock?

Client blocking for a long time

So how to prevent such an exception? Let’s not talk about solutions first, but discuss other defects later.

2、 Clock drift of redis server

If the machine clock of the redis server jumps forward, it will lead to the premature timeout failure of the key. For example, after client 1 gets the lock, the expiration time of the key is 12:02, but the clock of the redis server itself is 2 minutes faster than that of the client, resulting in the failure of the key at 12:00. At this time, if client 1 has not released the lock, it may lead to multiple clients The problem that clients hold the same lock at the same time.

3、 Single point instance security

If redis is in single master mode, when the machine goes down, all clients will not be able to obtain the lock. In order to improve the availability, a slave may be added to the master. However, because the master-slave synchronization of redis is asynchronous, the master may hang up after client 1 sets the lock, and the slave will be promoted to master. Because of the asynchronous replication feature, the guest can not get the lock The lock set by client 1 is lost. At this time, client 2 can also set the lock successfully, causing client 1 and client 2 to have the lock at the same time.

In order to solve the single point problem of redis, the author of redis proposed a new methodRedLockAlgorithm.

Redlock algorithm

The premise of the algorithm is that redis must be deployed in multiple nodes, which can effectively prevent single point of failure

1. Get the current time stamp (MS);

2. First, set the TTL of the key, after which it will be released automatically. Then the client tries to use the same key and value to set all the redis instances. Each time the redis instance is linked, set a much shorter timeout than TTL. This is to avoid waiting for the closed redis service for a long time. And try to get the next redis instance.

For example, if the TTL (that is, the expiration time) is 5S, the timeout for acquiring the lock can be set to 50ms, so if the lock cannot be acquired within 50ms, it will give up acquiring the lock and try to acquire the next lock;

3. The client obtains the time after all available locks minus the time of the first step, as well as the clock drift error of the redis server, and then the time difference is less than the TTL time, and the number of successful lock setting instances > = n / 2 + 1 (n is the number of redis instances), then the lock is successful

For example, if the TTL is 5S, it takes 2s to connect redis to get all the locks, and then subtract the clock drift (assuming the error is about 1s), then the real effective time of the lock is only 2S;

4. If the client fails to acquire the lock for some reason, it will start to unlock all redis instances.

According to this algorithm, suppose there are five redis instances, then the client only needs to obtain more than three of the locks, which is considered successful. The flow chart is like this:

Interviewer: do you really know redis distributed lock?

Key effective duration

Well, the algorithm is also introduced. From the design point of view, there is no doubt that the main idea of redlock algorithm is to effectively prevent the single point of failure of redis. In addition, the error of server clock drift is also considered when designing TTL, which improves the security of distributed lock.

But is that really the case? Anyway, I personally feel that the effect is average,

First of all, we can see that in the redlock algorithm, the effective time of the lock will be less than the time of connecting to the redis instance. If this process takes too long due to network problems, the effective time of the lock will be greatly reduced. The time for the client to access the shared resources is very short, and it is likely that the lock will expire in the process of program processing. Moreover, the effective time of the lock also needs to subtract the clock drift of the server, but how much is appropriate? If this value is not set properly, it is easy to cause problems.

Second, although this algorithm takes into account the problem of using multiple nodes to prevent a single point of failure in redis, if a node crashes and restarts, it is still possible for multiple clients to acquire locks at the same time.

Suppose there are five redis nodes: A, B, C, D and E. client 1 and 2 are locked respectively

  1. Client 1 successfully locked a, B, C, and obtained the lock (but D and E were not locked).
  2. The master of node C hangs up, and the lock has not yet been synchronized to slave. After slave is upgraded to master, the lock added by client 1 is lost.
  3. Client 2 acquires the lock at this time, locks C, D and E, and acquires the lock successfully.

In this way, client 1 and client 2 get the lock at the same time, and the hidden danger of program security still exists. In addition, if one of these nodes has time drift, it may also lead to lock security problems.

Therefore, although the availability and reliability are improved through the deployment of multiple instances, redlock does not completely solve the hidden danger of redis single point of failure, nor does it solve the problems of clock drift and lock timeout failure caused by long-term blocking of clients, and the hidden danger of lock security still exists.

conclusion

Some people may have to ask further, what can we do to ensure the absolute safety of the lock?

I can only say that we can’t have both fish and bear’s paw. The reason why we use redis as a tool of distributed lock is that redis is highly efficient and single process. It can guarantee the performance even in the case of high concurrency. But in many cases, performance and security can’t be taken into account. If you have to guarantee the security of lock, you can use other methods Middleware such as dB and zookeeper are used for control. These tools can ensure the security of locks, but the performance is not satisfactory. Otherwise, we would have used them for a long time.

Generally speaking, if redis is used to control shared resources and requires high data security requirements, the final solution is to make idempotent control over business data. In this way, even if multiple clients obtain locks, the consistency of data will not be affected. Of course, not all scenes are suitable for this. You need to deal with how to choose. After all, there is no perfect technology, only the right one is the best.


Author: Xue Xue, a person who is not technical, wants to read more wonderful articles. He can pay attention to my official account, scan the two-dimensional code or WeChat search below.My dear XueYou can reply to [e-book] and get learning materials ~ ~ ~ see you next time!