Distributed lock tutorial written for Xiaobai (1) Basic concepts and use

Time:2022-10-7

I came across distributed locks when I was just graduating, but I wasn’t very interested at that time, because the locks were decorated with distributed locks, and they suddenly became high-end. Up to now, I have only stayed in the callback method, and have not sorted out the relevant concepts in detail. This time, let’s clear up this concept. This article requires the foundation of Zookeeper and Redis. If not can I nugget the article:

  • “Zookeeper Study Notes (1) Basic Concepts and Simple Use” This public account contains
  • “Redis Study Notes (1) First Encounter” This article has not been migrated to the public account

When we talk about locks, what are we talking about?

The first thing I thought about when writing this was real-world locks, like this:

Distributed lock tutorial written for Xiaobai (1) Basic concepts and use

Locks in the real world are designed to protect resources. The person holding the key is regarded as the owner of the resource and can obtain the resource protected by the lock. Most of the locks in the real world are based on this design, such as door locks to prevent the theft of resources inside the door, and fingerprint locks on mobile phones to protect mobile phone resources. What kind of concept is the lock in the software world? Is it also for the protection of resources? To some extent, it can be understood in this way. Taking selling tickets under multi-threading as an example, if there is no lock, it will be possible for two threads to jointly sell a ticket. So we add synchronized to the operation of getting tickets and subtracting the total number of votes.

public class TicketSell implements Runnable {
    // a total of 100 tickets
    private int total = 2000;
    @Override
    public void run() {
        while (total > 0) {
            // total-- This operation is not atomic and may be interrupted.
            // A thread may not pull and complete the subtraction operation, and the time slice is exhausted. The B thread is entered and the value of total is read, and it will appear
            // Two threads appear to sell a ticket
            System.out.println(Thread.currentThread().getName() + "Selling:" + total--);
        }
    }
}
 public static void main(String[] args) {
        TicketSell ticketSell = new TicketSell();
        Thread a =  new Thread(ticketSell, "a");
        Thread b = new Thread(ticketSell, "b");
        a.start();
        b.start();
}

To avoid such a situation, we can use pessimistic locks synchronzed and ReentrantLock to lock code blocks to avoid oversold or repeated purchases. The reason is that the two threads we open belong to a JVM process, and the total is also located in the JVM process. The JVM can use lock to protect this variable. So if we move this number of votes to the table in the database, will the lock on us still work? It must not work. The reason is that the JVM process in the JVM process is managed by the JVM process, the database belongs to another process, and the JVM lock cannot lock the variables of another process. Next, we will move the total number of votes to the database and rewrite the program for selling tickets. First, we prepare a table. In order to save trouble, I will use the largest number of Student in my hand as the total number of votes, and create a table statement as follows:

CREATE TABLE `student`  (
  `id` int(11) NOT NULL,
  `name` varchar(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci NULL DEFAULT NULL,
  `number` int(255) NULL DEFAULT NULL COMMENT 'student number',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

The current largest number in our table is 5, and the id is 5. Our simulated ticket selling variables are as follows:

@RestController
public class TicketSellController {

    @Autowired
    private StudentInfoDao studentInfoDao;

    @GetMapping("sell")
    public String sellOne() throws Exception{
        Student student = studentInfoDao.selectById(5);
        Integer number = student.getNumber();
        // Thread simulation for five seconds, simulating business operations
        TimeUnit.SECONDS.sleep(5);
        student.setNumber(--number);
        studentInfoDao.updateById(student);
        return "Update successful";
    }

}

Start two requests in postman, and you will find that we have issued two requests, in fact, there is only one less ticket in the database. This TickSellController can also be deployed on multiple machines. Here we analyze that synchronized can ensure that TickSell will not sell more than this:

  • A synchronized thread with mutual exclusion is entering a code block modified by synchronized and has not finished executing it. Other threads entering will be blocked.

So how to achieve a synchronized effect when a thread accesses database data? Our current goal is an enhanced version of synchronized. Someone thought of SELECT … FOR UPDATE, but if you try it, you will find that this is not a feasible operation. The reason is that this lock time is subject to the longest transaction lock time in MySQL, and the database does not provide a corresponding method for us to query whether there is a corresponding lock. If both transactions execute select for update. It is true that only one transaction will execute successfully, but the other transaction will wait for the execution of the other transaction to complete before executing it. The method we expect is to check whether there is a lock on it when acquiring a lock. If there is a lock, at this time, referring to the synchronized lock upgrade, we can have two strategies. The first is to continuously re-acquire the lock again, and the second The first is to block if the lock fails to be acquired, waiting for the lock holder to wake up.

Then select for update doesn’t work, the database has a unique index, so we can build a table, the only index in the table is the number of commodities, so when selling tickets, first insert a record into the database according to the number and id of the commodity table, If it fails, it means that the lock grab failed. But how should this table be designed? Create a table for each table that needs to be controlled, this is not common. What fields should a general table have? The first is id. In modern high-level languages, methods are used as units, so a method field is required. This is actually enough for a single application, but if we split the application, it is not necessarily a microservice, then Different project names may appear in different applications, and the same method name exists. So you also need a project name field. But the same method name of different projects may manipulate different resources, so a resource ID is also required here. But what if an application is deployed in a cluster, so here we also need a machine ip.

In fact, there is another scene here. Students may not expect that our locks also support reentrancy, that is, the synchronized we designed also supports reentrancy, that is, the lock holder applies for the lock again and should be able to obtain it again. , the actual scene is as follows:

// This is just to illustrate the necessity of lock reentrancy, this does not set the recursion end condition
public void distributedLock(){
    Lock lock = new ReentrantLock();
    distributedLock();  
}

So here we also need to record the thread holding the lock and the number of reentries, so the final table building script is as follows:

DROP TABLE IF EXISTS `distributed_Lock`;
CREATE TABLE `distributed_Lock` (
  `id` int NOT NULL,
  `lock_key` varchar(100) NOT NULL,
  `thread_id` int NOT NULL,
  `entry_count` int NOT NULL,
  `host_ip` varchar(30) NOT NULL
  PRIMARY KEY (`id`),
  UNIQUE KEY `lock_key` (`lock_key`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

lock_key consists of the current project name-method name-resource name. Therefore, the logic of acquiring the lock is as follows. First, determine whether there is a lock, and then determine whether it is your own lock. If it is not your own lock, try again. If there is no lock, try to lock, and if the lock fails, it will also enter retry. Maybe some students will ask when they see this, if it is not already judged that there is no lock, how can it still fail to lock. Our operation is as follows:

# If not found, it means no lock
select * from distributed_Lock where lockKey = '' # Statement 1
# then lock
insert into distributed_Lock # Statement 2

It may be that the two threads have executed statement one within a short period of time, and both have obtained no locks, and then only one record will be successfully inserted into the unique index. How to retry then? In fact, you can also do a retryer here, there are two retry strategies:

  • Retry failed, try again. until the maximum number of retries is reached.
  • Retry failed, wait for a while and try again.

Get the lock and try again. We have basically finished the design here, so how to release the lock? We are locking, after finishing the operation, release it directly? Then if the lock is just successfully added, then the application is down, so in order to pursue the high availability of our system, we need to prepare a scheduled task to release the lock, and introduce a new tool to solve the problem. In fact, this tool will also bring Come to new questions, such as we use timed tasks to avoid deadlocks in this case, but how to determine whether a deadlock has occurred? So we need to store a lock holding time in our table? Then, when the time is almost up, the lock is renewed. If we want to use the database for resource control, then the composition of the corresponding programming language level tool class has the following components:

  • lock
  • Retry
  • Lock renewal

Distributed lock

In fact, what we discussed above is the distributed lock based on the database implementation. Above we use the unique index to realize the protection of resources. What is the distributed lock:

Distributed lock, is a way to control synchronous access to shared resources between distributed systems. In distributed systems, it is often necessary to coordinate their actions. If different systems or different hosts of the same system share one or a set of resources, mutual exclusion is often required to prevent mutual interference to ensure consistency when accessing these resources. In this case, it is necessary to Use distributed locks.

In the above, we implement distributed locks based on the database. There are many steps and high costs. This is a usage scenario for distributed locks:Seckill to reduce inventory and other similar businesses to prevent oversold situations. We also do it with distributed locks:

  • Prevent cache breakdown

For example, if a key in the cache has expired, in order to avoid the access to the key from hitting the database, we can use distributed locks as control to ensure that one access hits the database and other accesses fail. Wait until the database is loaded into the cache before releasing the lock.

  • Guarantee interface idempotency

For repeated submission of forms, distributed locks can also solve this scenario. Add distributed locks to the interface, and the second access finds that a lock prompt has been submitted.

  • task scheduling

We discussed using scheduled tasks to prevent deadlocks above, but the application where the scheduled tasks are located may also hang, so we can deploy several more in pursuit of high availability, but only one can run.

In fact, we often use Redis and Zookeeper to implement distributed locks. The reason is that based on the relatively high memory performance and rich features, we can build a distributed lock without spending too much effort.

Implementing distributed locks with Redis

When we discussed the implementation of distributed locks with databases, we mentioned that in order to prevent successful locking, the locks have not been successfully released, and the application will crash, resulting in deadlocks. In order to pursue high availability, we introduce timed tasks to scan these abnormal lock occupations and release locks. There is just a command to set the key cache time in Redis, which is still atomic. So we don’t have to worry about deadlocks. But how long to set is another question. Our hope is just right, so here we introduce lock renewal, that is, the lock has not been released after one-third of the lock cache time, so the lock is renewed and locked. time. Now another problem we face is how to deploy Redis:

  • stand-alone

The disadvantage is obvious. This Redis accidentally crashes, and the entire distributed lock is unavailable.

  • sentinel

In order to pursue high availability, I deployed a few more Redis. When the master node is unavailable, the slave node will be automatically selected to be promoted to the master node. But there is still a problem. If I have just written to the master node, and the synchronization has not been completed, the master node will go down.

  • RedLock

For the famous red lock, if one master node is not enough, then add several more master nodes and lock them one by one. As long as more than half of the locks are locked, it means that the locking is successful, and when it is released, it is released one by one. In this case, even if a master node fails, there are other spare tires. It seems that the problem has been solved perfectly. In fact, there are still loopholes. It is impossible for us to explain this loophole in an article, and we will talk about it later. Here we just do a simple understanding.

Are we going to implement a distributed lock based on Redis from scratch? Of course not, basically mainstream high-level languages ​​have well-packaged implementations. Here we choose to introduce RedisSession in the Java field.

Standard implementation – java

The first step is still to introduce maven dependencies:

<dependency>
    <groupId>org.redisson</groupId>
    <artifactId>redisson</artifactId>
    <version>3.17.7</version>
</dependency>

Simple example:

Config config = new Config();
 config.useClusterServers()
                // can add multiple ip
                .addNodeAddress("redis://127.0.0.1:7181");
 RedissonClient redisson = Redisson.create(config);
        // Acquire an unfair lock
 RLock lock = redisson.getLock("myLock");
        // read-write lock
 RReadWriteLock readWriteLock = redisson.getReadWriteLock("myLock");
        // fair lock
redisson.getFairLock("myLock");
        // spin lock
 redisson.getSpinLock("myLock");

boolean lockResult = lock.tryLock();

lock.unlock();// release the lock

Implementing distributed locks with Zookeeper

We have introduced the use of Zookeeper’s temporary sequential nodes to realize distributed locks in “Zookeeper Study Notes (1) Basic Concepts and Simple Use”, so there is no need to worry about the deadlock problem caused by the client downtime after locking, because the client downtime After the machine is turned off, the temporary node disappears. Zookeeper’s nodes are renewed through sessions. Zookeeper has a concept of heartbeat links. If the Zookeeper server does not receive the heartbeat of the session for a long time, it will consider the client to be inactive and delete the corresponding node.

Therefore, it is relatively simple to implement distributed locks with Zookeeper. The client requests Zookeeper to create a node, and then judges whether the value of its own node is the smallest. If the smallest represents the successful lock grab, other nodes open the listener to monitor the previous node, the previous node. If it disappears, it means that the lock can be acquired. This is the implementation of fair lock. Of course we will not implement this distributed lock from scratch. Zookeeper has a very powerful client to support distributed locks, its name is Curator. Let’s briefly introduce its basic use:

  <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>5.3.0</version>
        </dependency>
 <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-client</artifactId>
            <version>5.1.0</version>
 </dependency>
// If the connection cannot be made, perform a retry and retry four times. 400ms between retries
RetryNTimes retryNTimes = new RetryNTimes(4,400);
CuratorFramework client = CuratorFrameworkFactory.newClient("", retryNTimes);
client.start();
InterProcessMutex lock = new InterProcessMutex(client, ""); // reentrant fair lock
InterProcessSemaphoreMutex interProcessSemaphoreMutex = new InterProcessSemaphoreMutex(client,"");// Non-reentrant unfair lock
InterProcessReadWriteLock interProcessReadWriteLock = new InterProcessReadWriteLock(client,"");//The read-write lock can be flushed
lock.acquire(); // lock
lock.release();

in conclusion

In a distributed system, when accessing shared resources, it is often necessary to coordinate their actions. When each application accesses shared resources, those in the preemption operate on the shared resources, and those not in preemption enter a waiting or other state, which is the distribution of lock, exclusive access to shared resources. Distributed lock is an idea, a well-designed distributed lock should have the following characteristics:

  • High availability
  • reentrant
  • no deadlock
  • mutually exclusive

There are three mainstream implementations:

  • Distributed lock based on database
  • Distributed lock based on Redis
  • Distributed lock based on Zookeeper

We generally use Redis and Zookeeper to implement distributed locks in actual development. These two middleware have open source distributed lock frameworks, and we can directly integrate them into the project.

References

  • “Hello, Distributed Lock” https://juejin.cn/book/701839…
  • Redisson https://github.com/redisson/r…