Operating system concurrency lock

Time:2021-4-28

concept

Through the introduction of concurrency, we can see the most basic problem of concurrent programming: due to the interruption on a single processor (or multiple threads executing concurrently on a multi processor), some instructions that we hope can be executed atomically can not run correctly. Lock is the most basic way to solve this problem. The programmer locks the source code and places it around the critical area to ensure that the critical area can be executed like a single atomic instruction.

The basic idea of lock

Here is a simple example of using locks:

1    pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
2
3    Pthread_mutex_lock(&lock);    // wrapper for pthread_mutex_lock()
4    balance = balance + 1;
5    Pthread_mutex_unlock(&lock);

A lock is a variable that holds the state of the lock at a certain time. It is either available (or unlocked, or free), indicating that no thread holds the lock; Either it is occupied (acquired, locked, or held), indicating that a thread holds a lock and is in the critical area.We can also save other information, such as the thread holding the lock, or the thread queue requesting to obtain the lock, but these information will be hidden, and the lock user will not find it.

Generally, locks only support two operations: lock() and unlock(). Call lock() to get the lock. If no other thread holds the lock, the thread will get the lock and enter the critical area. This thread is called the lock owner. If another thread calls lock () on the same lock variable because the lock is held by another thread, the call will not return. In this way, when the thread holding the lock is in the critical area, other threads cannot enter the critical area.

Once the lock holder calls unlock (), the lock becomes available. If there are no other waiting threads (that is, no other thread has called lock () and stuck there), the lock state becomes available. If there is a waiting thread, one of them will (eventually) Notice (or receive notification) the change of the lock state, acquire the lock and enter the critical area.

How to realize lock

Obviously, we need the help of hardware and operating system to implement a usable lock. In recent years, different hardware primitives have been added to instruction sets of various computer architectures. We can use them to implement mutually exclusive primitives such as locks.

Before the implementation of the lock, we should set up some standards for whether the lock can work well. The first is whether the lock can complete its basic task, that is, whether it can provide mutual exclusion and prevent multiple threads from entering the critical area.

The second is fairness. When the lock is available, does every competing thread have a fair chance to seize the lock? Are threads competing for locks starving and unable to obtain locks?

Finally, there is performance, which is the increased time cost after using locks. There are several scenarios to consider: one is that only one thread grabs the lock, and what is the cost of releasing the lock? The other is the competition of multiple threads on a CPU, and the last is the performance of multiple CPUs and threads.

Entry point 1: control interrupt

One of the earliest mutual exclusion solutions is to turn off interrupts in critical areas.This solution is developed for single processor systems. By turning off the interrupt before entering the critical area (using special hardware instructions), the code in the critical area can be guaranteed not to be interrupted, so that it can be executed atomically. After that, we turn on the interrupt again and the program runs normally.

The main advantage of this method is that it is simple, but it has many disadvantages. First of all, this method requires us to allow all calling threads to perform privileged operations (open close interrupts), but malicious programs may take advantage of this. For example, a malicious program may call lock () at the beginning of the program to monopolize the processor. The system could not regain control and could only be rebooted.

Second, this scheme does not support multiprocessors. If multiple threads are running on different CPUs, and each thread tries to enter the same critical area, closing the interrupt has no effect.

Third, shutting down interrupts results in loss of interrupts, which may lead to serious system problems. If the disk device completes the read request, but the CPU misses this signal because it turns off the interrupt, how does the operating system know to wake up the waiting process?

The last, less important reason is inefficiency. Compared with normal instruction execution, modern CPUs execute slower code to close and open interrupts.

Based on the above reasons, mutex primitives are only implemented by closing interrupts in very limited cases.

Entry point 2: Test and set instructions

Because the method of turning off interrupts can’t work on multiprocessors, system designers began to let the hardware support locks.The simplest hardware support is test and set instruction, also known as atomic exchange.The work of testing and setting instructions can be roughly defined by the following C code:

1    int TestAndSet(int *old_ptr, int new) {
2        int old = *old_ptr; // fetch old value at old_ptr
3        *old_ptr = new;    // store 'new' into old_ptr
4        return old;        // return the old value
5    }

It returns old_ PTR points to the old value and is updated to the new value of new. The key, of course, is that the code is executed atomically.Because you can test old values and set new values, we call this instruction “test and set.”.

To understand how the instruction constructs a usable lock, we first try to implement a lock that does not depend on it.

A failed attempt

In the first attempt, the idea was simple: use a variable to mark whether the lock is occupied by some thread. The first thread enters the critical area, calls lock (), checks whether the flag is 1, and then sets the flag to 1, indicating that the thread holds the lock. At the end of the critical area, the thread calls unlock () to clear the flag, indicating that the lock is not held.

When the first thread is in the critical area, if another thread calls lock (), it will spin wait in the while loop until the first thread calls unlock () to clear the flag. Then the waiting thread exits the while loop, sets the flag, and executes the critical section code.

1    typedef struct  lock_t { int flag; } lock_t;
2
3    void init(lock_t *mutex) {
4        // 0 -> lock is available, 1 -> held
5        mutex->flag = 0;
6    }
7
8    void lock(lock_t *mutex) {
9        while (mutex->flag == 1) // TEST the flag
10           ; // spin-wait (do nothing)
11       mutex->flag = 1;         // now SET it!
12   }
13
14   void unlock(lock_t *mutex) {
15       mutex->flag = 0;
16   }

Unfortunately, this code doesn’t work properly. Suppose that the code is executed according to the following table, with flag = 0 at the beginning.

Operating system concurrency lock

From this alternate execution, we can see that through timely interruption, we can easily construct a scenario in which both threads can enter the critical area with the flag set to 1.

Use test and set instructions to improve

The improved code is as follows:

1    typedef struct  lock_t {
2        int flag;
3    } lock_t;
4
5    void init(lock_t *lock) {
6        // 0 indicates that lock is available, 1 that it is held
7        lock->flag = 0;
8    }
9 
10   void lock(lock_t *lock) {
11       while (TestAndSet(&lock->flag, 1) == 1)
12           ; // spin-wait (do nothing)
13   }
14 
15   void unlock(lock_t *lock) {
16       lock->flag = 0;
17   }

Let’s understand how this lock works. First of all, suppose a thread is running and calls lock (). No other thread holds the lock, so the flag is 0. When the testandset (flag, 1) method is called and 0 is returned, the thread will jump out of the while loop and obtain the lock. At the same time, the flag will be set to 1 atomically, indicating that the lock has been held. When the thread leaves the critical area, call unlock() to clear the flag to 0.

When a thread already holds a lock. The worker thread calls lock (), then calls TestAndSet (flag, 1), and this time returns 1. As long as another thread holds the lock all the time, testandset() will return 1 repeatedly, and this thread will spin all the time. When the flag is finally changed to 0, the thread will call testandset (), return 0 and atomically set to 1, so as to obtain the lock and enter the critical area.

This kind of lock is called spin lock.This is the simplest kind of lock that spins all the time, using CPU cycles until the lock is available.On a single processor, a preemptive scheduler is needed. Otherwise, spin lock cannot be used on a single CPU, because a spin thread will never give up the CPU.

Evaluate spin lock

Let’s evaluate our spin lock according to the previous criteria. The first is correctness: spin lock only allows one thread to enter the critical area at a time, so it can run correctly.

The next criterion is fairness:The answer is that spin lock does not provide any guarantee of fairness. In fact, a spinning thread may spin forever under competitive conditions. Spin lock has no fairness and may lead to starvation.

The last criterion is performance.For spin lock, in the case of single CPU, the performance overhead is quite large.Suppose a thread holds a lock and is preempted when it enters the critical area. The scheduler may run every other thread (assuming n − 1). While other threads are competing for locks, they will spin a time slice before abandoning the CPU, wasting CPU cycles.

However, on multi CPUs, spin locking performs well (if the number of threads is approximately equal to the number of CPUs). Suppose that thread a is on CPU 1 and thread B is competing for the same lock on CPU 2. When thread a holds the lock, thread B will race for the lock. However, the critical area is usually very short, so the lock becomes available very quickly, and then thread B acquires the lock. Spin waiting for locks on other processors doesn’t waste a lot of CPU cycles, so the effect is good.

Entry point 3: compare and exchange instructions

Some systems provide another hardware primitive, compare and swap. The following is the C pseudo code of this instruction.

1    int CompareAndSwap(int *ptr, int expected, int new) {
2        int actual = *ptr;
3        if (actual == expected)
4            *ptr = new;
5        return actual;
6    }

The basic idea of comparison and exchange is to detect whether the value of PTR is equal to expected; If so, update the value indicated by PTR to a new value. Otherwise, do nothing. In either case, the actual value of the memory address is returned, so that the caller can know whether the execution is successful.

With compare and exchange instructions, you can implement a lock, similar to using test and set instructions. For example, we just need to replace the lock() function in the above example with the following code:

1    void lock(lock_t *lock) {
2        while (CompareAndSwap(&lock->flag, 0, 1) == 1)
3            ; // spin
4    }

Its behavior and evaluation are equivalent to the spin lock analyzed above.

Entry point 4: get and add instructions

The last hardware primitive is the fetch and add instruction, which can atomically return the old value of a specific address and make the value increase by one. The pseudo code of C language obtained and added is as follows:

1    int FetchAndAdd(int *ptr) {
2        int old = *ptr;
3        *ptr = old + 1;
4        return old;
5    }

We can achieve a more interesting ticket lock by acquiring and adding instructions

1    typedef struct  lock_t {
2        int ticket;
3        int turn;
4    } lock_t;
5
6    void init(lock_t *lock) {
7        lock->ticket = 0;
8        lock->turn   = 0;
9    }
10
11   void lock(lock_t *lock) {
12       int myturn = FetchAndAdd(&lock->ticket);
13       while (lock->turn != myturn)
14           ; // spin
15   }
16
17   void unlock(lock_t *lock) {
18       FetchAndAdd(&lock->turn);
19   }

Instead of using a value, the ticket and turn variables are used to construct the lock. The basic operation is also very simple: if a thread wants to acquire a lock, it first performs an atomic acquisition and addition instruction on a ticket value. This value is used as the “turn” of the thread. According to the global shared lock > turn variable, when a thread’s (myturn = = turn), it’s its turn to enter the critical area. Unlock is to add turn so that the next waiting thread can enter the critical area.

Different from the previous method:This method can ensure that all threads can seize the lock. As long as a thread gets the ticket value, it will eventually be scheduled.For example, based on the method of testing and setting, a thread may spin all the time, even if other threads are acquiring and releasing locks.

How to avoid too much spin

Hardware based locking is simple and effective, but in some scenarios, these solutions are inefficient. Take two threads running on a single processor as an example. When a thread (thread 1) holds a lock, it is interrupted. The second thread (thread 2) gets the lock and finds that the lock has been held. So it spins all the time. Finally, a clock interrupt is generated, thread 1 runs again, and it releases the lock. Finally, thread 2 does not need to spin any more, it acquires the lock.

In a similar scenario, a thread will always spin check an unchangeable value, wasting the entire time slice. If there are n threads competing for a lock, the situation will be even worse. In the same scenario, n − 1 time slice will be wasted, just spinning and waiting for a thread to release the lock. Therefore, our next key question is: how to avoid unnecessary spin and waste CPU time?

Easy way: give up the time slice

The first simple way is to give up the CPU when spinning. The following figure shows this method.

1    void init() {
2        flag = 0;
3    }
4
5    void lock() {
6        while (TestAndSet(&flag, 1) == 1)
7            yield(); // give up the CPU
8    }
9
10   void unlock() {
11       flag = 0;
12   }

In this method, we assume that the operating system provides the primitive yield (), which can be called by the thread to actively abandon the CPU and let other threads run. The yield() system call can change the thread from running state to ready state, thus allowing other threads to run. Therefore, the relinquishing thread essentially deschedules itself.

Considering running two threads on a single CPU, yield based method is very effective. When a thread calls lock (), it gives up the CPU and another thread runs to complete the critical area. In this simple example, the yield method works very well.

Now consider the case where many threads (for example, 100) repeatedly compete for a lock. In this case, a thread holds the lock and is preempted before releasing it. The other 99 threads call lock (), find that the lock is preempted, and then give up the CPU. Assuming that some kind of round robin scheduler is used, the 99 threads will always be in run let out mode until the thread holding the lock runs again.Although it is better than the original spin scheme which wastes 99 time slices, the cost of this method is still very high, and the cost of context switching is real, so it is a waste.

What’s worse, we haven’t thought about hunger yet.A thread may always be in a yield loop, while other threads repeatedly enter and exit the critical area.Obviously, we need a way to solve this problem.

Use queue: sleep instead of spin

The real problem with some of the previous methods is that there is too much chance: the scheduler decides how to schedule the thread. If the scheduling is not reasonable, the thread will spin all the time or give up the CPU immediately. Either way, it can cause waste, and it can’t prevent hunger.

Therefore, we must explicitly exert some kind of control to decide who can seize the lock when it is released. To do this, we need more support from the operating system and a queue to hold threads waiting for locks.

For simplicity, we use the support provided by Solaris, which provides two calls: park() can make the calling thread sleep, and unpark (ThreadID) will wake up the thread identified by ThreadID. These two calls can be used to implement the lock, so that the caller can sleep when the lock is not available and wake up when the lock is available.

1    typedef struct  lock_t {
2        int flag;
3        int guard;
4        queue_t *q;
5    } lock_t;
6
7    void lock_init(lock_t *m) {
8        m->flag = 0;
9        m->guard = 0;
10       queue_init(m->q);
11   }
12
13   void lock(lock_t *m) {
14       while (TestAndSet(&m->guard, 1) == 1)
15           ; //acquire guard lock by spinning
16       if (m->flag == 0) {
17           m->flag = 1; // lock is acquired
18           m->guard = 0;
19       } else {
20           queue_add(m->q, gettid());
21           m->guard = 0;
22           park();
23       }
24   }
25
26   void unlock(lock_t *m) {
27       while (TestAndSet(&m->guard, 1) == 1)
28           ; //acquire guard lock by spinning
29       if (queue_empty(m->q))
30           m->flag = 0; // let go of lock; no one wants it
31       else
32           unpark(queue_remove(m->q)); // hold lock (for next thread!)
33       m->guard = 0;
34   }

In this case, we did two things. First, we combine the previous test and setup with the waiting queue to achieve a higher performance lock. Secondly, we control who will get the lock through the queue to avoid starvation.

You may notice that guard basically acts as a spin lock around flag and queue operations. Therefore, this method does not completely avoid spin waiting. The thread may be interrupted when acquiring or releasing the lock, causing other threads to spin and wait. However, the spin wait time is very limited (not a user-defined critical area, just a few instructions in the lock and unlock codes).

Flag is not set to 0 when you want to wake up another thread. Why? Because when a thread is awakened, it is like returning from a call to park(). At this time, it does not hold guard, so it cannot set flag to 1. Therefore, we directly pass the lock from the released thread to the next thread to obtain the lock, during which the flag does not have to be set to 0.

However, there is a flaw in the code. Suppose a thread is going to call park to sleep, but unfortunately, the system switches to the thread holding the lock. If the thread subsequently releases the lock, the previous thread may sleep forever after calling park. To avoid this, we need extra work.

Solaris solves this problem by adding a third system call, setpark(). Through setpark (), a thread indicates that it is about to call park. If another thread is scheduled and unpark is called, the subsequent call to park will return directly instead of sleeping all the time. Therefore, the lock() call in the sample code can be slightly modified:

1    queue_add(m->q, gettid());
2    setpark(); // new code
3    m->guard = 0;

Another solution is to pass guard into the kernel. In this case, the kernel can take preventive measures to release the lock atomically and remove the running thread from the queue.

Two stage lock

Two phase lock is an old lock scheme, which has been used for many years. The two-stage lock realizes that spin can be useful, especially in scenarios where the lock is about to be released soon. Therefore, the first stage of the two-stage lock spins for a period of time, hoping that it can acquire the lock.

However, if the lock is not acquired in the first spin phase, the caller in the second phase sleeps until the lock is available. A common way is to spin a fixed number of times in the cycle and then sleep.

Lock based concurrent data structure

Let’s discuss how to use locks in common data structures. Our challenge is: how to lock a specific data structure to make it function correctly? And how to ensure high performance?

Concurrency counter

Easy version

It is very simple to implement a simple concurrency counter. The code is as follows:

1    typedef struct  counter_t {
2        int            value;
3        pthread_mutex_t lock;
4    } counter_t;
5
6    void init(counter_t *c) {
7        c->value = 0;
8        Pthread_mutex_init(&c->lock,  NULL);
9    }
10
11   void increment(counter_t *c) {
12       Pthread_mutex_lock(&c->lock);
13       c->value++;
14       Pthread_mutex_unlock(&c->lock);
15   }
16
17   void decrement(counter_t *c) {
18       Pthread_mutex_lock(&c->lock);
19       c->value--;
20       Pthread_mutex_unlock(&c->lock);
21   }
22
23   int get(counter_t *c) {
24       Pthread_mutex_lock(&c->lock);
25       int rc = c->value;
26       Pthread_mutex_unlock(&c->lock);
27       return rc;
28   }

This concurrency counter follows theA common data pattern in the simplest and most basic concurrent data structure is that it only adds a lock, obtains the lock when the function is called to operate the data structure, and releases the lock when the function is called back

Now let’s look at its performance. If the simple scheme can work, and the running speed does not drop significantly. There’s no need for elaborate design.

We run a benchmark, each thread updates the same shared counter a fixed number of times, and then we change the number of threads. The figure below shows the total time taken to run one thread to four threads, with each thread updating the counter one million times. By increasing CPU, we hope to accomplish more tasks per unit time. As can be seen from the curve above, the synchronous counter has poor scalability. It only takes a very short time (about 0.03s) for a single thread to complete one million updates, while two threads execute concurrently, resulting in a lot of performance degradation (more than 5S!). When there are more threads, the performance is worse.

Operating system concurrency lock

Extensible version

We will introduce a method called sloppy counter.

The lazy counter implements a logic counter by multiple local counters and a global counter, in which each CPU core has a local counter.Specifically, on a machine with four CPUs, there are four local counters and one global counter. In addition to these counters, there are locks: one for each local counter and one for the global counter.

The basic idea of the lazy counter is as follows:If a thread on the core wants to increase the counter, it should increase its local counter. Accessing the local counter is synchronized through the corresponding local lock. Because each CPU has its own local counter, threads on different CPUs will not compete, so the update operation of the counter has good scalability. However, in order to keep the global counter updated, the local value will be transferred to the global counter periodically. The method is to obtain the global lock, let the global counter add the value of the local counter, and then set the local counter to zero.

The frequency of local to global conversion depends on a threshold, which is called s (sloppiness). The smaller the S is, the closer the lazy counter is to the synchronous counter above. The larger the S is, the stronger the scalability is, but the larger the deviation between the global counter and the actual count is.

The lower line in the above benchmark effect diagram shows the performance of the lazy counter when the threshold value s is 1024. The time for four processors to update 4 million times is almost the same as that for one processor to update 1 million times. The figure below shows the performance curve of the lazy counter as the threshold s changes. The lazy counter is a trade-off between accuracy and performance.

Operating system concurrency lock

Here is the basic implementation of the lazy counter

1    typedef struct  counter_t {
2        int             global;            // global count
3        pthread_mutex_t glock;             // global lock
4        int             local[NUMCPUS];    // local count (per cpu)
5        pthread_mutex_t llock[NUMCPUS];    // ... and locks
6        int             threshold;         // update frequency
7    } counter_t;
8
9    // init: record threshold, init locks, init values
10   //       of all local counts and global count
11   void init(counter_t *c, int threshold) {
12       c->threshold = threshold;
13
14       c->global = 0;
15       pthread_mutex_init(&c->glock,  NULL);
16
17       int i;
18       for (i = 0; i < NUMCPUS; i++) {
19           c->local[i] = 0;
20           pthread_mutex_init(&c->llock[i],  NULL);
21       }
22   }
23
24   // update: usually, just grab local lock and update local amount
25   //        once local count has risen by 'threshold', grab global
26   //        lock and transfer local values to it
27   void update(counter_t *c, int threadID, int amt) {
28       pthread_mutex_lock(&c->llock[threadID]);
29       c->local[threadID] += amt;               // assumes amt > 0
30       if (c->local[threadID] >= c->threshold) { // transfer to global
31           pthread_mutex_lock(&c->glock);
32           c->global += c->local[threadID];
33           pthread_mutex_unlock(&c->glock);
34           c->local[threadID] = 0;
35       }
36       pthread_mutex_unlock(&c->llock[threadID]);
37   }
38
39   // get: just return global amount (which may not be perfect)
40   int get(counter_t *c) {
41       pthread_mutex_lock(&c->glock);
42       int val = c->global;
43       pthread_mutex_unlock(&c->glock);
44       return val; // only approximate!
45   }

Concurrent linked list

Next, let’s look at a more complex data structure – linked list. For simplicity, we only focus on the insertion operation of linked list.

Easy version

The basic implementation code is shown below

1    // basic node structure
2    typedef struct  node_t {
3        int                key;
4        struct  node_t        *next;
5    } node_t;
6
7    // basic list structure (one used per list)
8    typedef struct  list_t {
9        node_t                *head;
10       pthread_mutex_t    lock;
11   } list_t;
12
13   void List_Init(list_t *L) {
14       L->head = NULL;
15       pthread_mutex_init(&L->lock,  NULL);
16   }
17
18   int List_Insert(list_t *L, int key) {
19       pthread_mutex_lock(&L->lock);
20       node_t *new = malloc(sizeof(node_t));
21       if (new == NULL) {
22           perror("malloc");
23           pthread_mutex_unlock(&L->lock);
24           return -1; // fail
25       }
26       new->key = key;
27       new->next = L->head;
28       L->head = new;
29       pthread_mutex_unlock(&L->lock);
30       return 0; // success
31   }
32
33   int List_Lookup(list_t *L, int key) {
34       pthread_mutex_lock(&L->lock);
35       node_t *curr = L->head;
36       while (curr) {
37           if (curr->key == key) {
38               pthread_mutex_unlock(&L->lock);
39               return 0; // success
40           }
41           curr = curr->next;
42       }
43       pthread_mutex_unlock(&L->lock);
44       return -1; // failure
45   }
How to expand

Although we have the basic concurrent linked list, we have encountered the problem of poor scalability. One of the technologies that researchers have found to increase the concurrency of linked lists is called hand over hand locking, also known as lock coupling.

The principle is simple:Each node has a lock to replace the previous lock of the whole linked list. When traversing the linked list, first seize the lock of the next node, and then release the lock of the current node.

Conceptually speaking, it makes sense to use the linked list to increase the concurrency of linked list operations.But in fact, when traversing, the cost of each node to obtain and release the lock is huge, which is difficult to be faster than the single lock method. Even if there are a large number of threads and a large list, the concurrent scheme is not necessarily faster than the single lock scheme. Maybe some hybrid scheme (a certain number of nodes with a lock) is worth trying.

If the scheme brings a lot of overhead, then high concurrency is meaningless. If a simple scheme rarely uses high overhead calls, it is usually very effective, and adding more locks and complexity may be counterproductive.

For the above example code, there is also a general suggestion:Note that changes in control flow or other error conditions cause the function to return and stop execution. Because many functions will get locks, allocate memory, or perform other operations to change the state at the beginning, if an error occurs, the code needs to restore various states before returning, which is prone to errors. Therefore, it is better to organize the code and reduce this pattern.

Concurrent queue

The following is the implementation code of a concurrent queue:

1    typedef struct  node_t {
2        int                 value;
3        struct  node_t     *next;
4    } node_t;
5
6    typedef struct  queue_t {
7        node_t            *head;
8        node_t            *tail;
9        pthread_mutex_t    headLock;
10       pthread_mutex_t    tailLock;
11   } queue_t;
12
13   void Queue_Init(queue_t *q) {
14       node_t *tmp = malloc(sizeof(node_t));
15       tmp->next = NULL;
16       q->head = q->tail = tmp;
17       pthread_mutex_init(&q->headLock,  NULL);
18       pthread_mutex_init(&q->tailLock,  NULL);
19   }
20
21   void Queue_Enqueue(queue_t *q, int value) {
22       node_t *tmp = malloc(sizeof(node_t));
23       assert(tmp != NULL);
24       tmp->value = value;
25       tmp->next = NULL;
26
27       pthread_mutex_lock(&q->tailLock);
28       q->tail->next = tmp;
29       q->tail = tmp;
30       pthread_mutex_unlock(&q->tailLock);
31   }
32
33   int Queue_Dequeue(queue_t *q, int *value) {
34       pthread_mutex_lock(&q->headLock);
35       node_t *tmp = q->head;
36       node_t *newHead = tmp->next;
37       if (newHead == NULL) {
38           pthread_mutex_unlock(&q->headLock);
39           return -1; // queue was empty
40       }
41       *value = newHead->value;
42       q->head = newHead;
43       pthread_mutex_unlock(&q->headLock);
44       free(tmp);
45       return 0;
46   }

There are two locks in this code, one for the queue head and the other for the queue tail. These two locks enable the in queue and out queue operations to be performed concurrentlyBecause the in queue only accesses the tail lock, while the out queue only accesses the head lock. Another trick here is to add a dummy node (allocated in the queue initialization code) that separates the head and tail operations.

Queues are widely used in multithreaded programs. However, the queues here usually do not fully meet the needs of this program. A more perfect bounded queue can make threads wait when the queue is empty or full.

And diverge the list

Our example is a simple hash table that doesn’t need to be resized.

1    #define BUCKETS (101)
2
3    typedef struct  hash_t {
4        list_t lists[BUCKETS];
5    } hash_t;
6
7    void Hash_Init(hash_t *H) {
8        int i;
9        for (i = 0; i < BUCKETS; i++) {
10           List_Init(&H->lists[i]);
11       }
12   }
13
14   int Hash_Insert(hash_t *H, int key) {
15       int bucket = key % BUCKETS;
16       return List_Insert(&H->lists[bucket], key);
17   }
18
19   int Hash_Lookup(hash_t *H, int key) {
20       int bucket = key % BUCKETS;
21       return List_Lookup(&H->lists[bucket], key);
22   }

The hash table in this example uses the simple concurrent linked list we implemented earlier,Each hash bucket (each bucket is a linked list) has a lock instead of the whole hash table, which supports many concurrent operations

proposal

When implementing concurrent data structure, we should start with the simplest solution, that is, add a big lock to synchronize. If performance problems are found, the method can be improved as long as it is optimized to meet the needs.

Many operating systems, including sun and Linux, use a big lock in the initial transition to multiprocessor. This solution has been very effective for many years, until the popularity of multi CPU system, the kernel only allows one thread activity to become a performance bottleneck. Linux adopts a simple scheme, changing one lock into multiple locks; Sun has implemented a new system, Solaris, which can be concurrent at the beginning.