Atomic operation and lock free programming of C + + multithreading concurrency under Linux


1、 What is atomic operation

Atomic operation: as the name suggests, it is an inseparable operation. There are only two states of the operation: not started and completed, and there is no intermediate state;

Atomic type: the data types defined in the atomic library. All operations on these types are atomic, including the data types instantiated through the atom class template STD:: atomic < T > which also support atomic operations.

2、 How to use atomic types

2.1 atomic operations supported by atomic library atomic

Atomic library < atomic > provides some basic atomic types. You can also instantiate an atomic object through atom class template. Some basic atomic types and corresponding specialized templates are listed as follows:

Atomic operation and lock free programming of C + + multithreading concurrency under Linux

The most important access to atomic types is read and write, but the corresponding atomic operations provided by atomic library are load() and store (VAL). Atomic operations supported by atomic types are as follows:

Atomic operation and lock free programming of C + + multithreading concurrency under Linux

2.2 memory access model in atomic operation

Atomic operation ensures that the access to data is only in two states: not started and completed, and will not access to the intermediate state. However, we generally need a specific order to access the data. For example, if we want to read the latest data after writing, the atomic operation function supports controlling the read-write order, that is, it has a data synchronization memory model parameter STD:: memory_ Order, which is used to sort the read and write operations at the same time. The six types defined by C + + 11 are as follows:

  • memory_ order_ Relaxed: relaxed operation with no synchronization or sequence constraints. Only atomicity is required for this operation;
  • memory_ order_ release & memory_ order_ Acquire: two threads a & B. after a thread is released, B thread acquire can ensure that it must read the latest modified value; this model is more powerful because it can ensure that all write operations before a-release can read the latest value after b-acquire;
  • memory_ order_ release & memory_ order_ Consider: the synchronization of the previous model is for all objects, and this model is only for objects that depend on the operation: for example, this operation occurs on variable a, and S = a + B; that s depends on a, but B does not depend on a; of course, there is also the problem of circular dependence, such as: T = S + 1, because s depends on a, then t actually depends on a;
  • memory_ order_ seq_ CST: sequential consistency model, which is the default model of atomic operation in C + + 11; the behavior is release acquire operation for every variable, of course, it is also the slowest synchronization model;

Memory access model belongs to the lower level control interface. If you don’t understand the compilation principle and CPU instruction execution process, it is easy to introduce bugs. The memory model is not the focus of this chapter and will not be introduced here. The default sequential consistency model or the more secure release acquire model are used in subsequent codes.

C / C + + Linux server architect needs to learn materials plus group 563998835 (materials include C / C + +, Linux, golang technology, nginx, zeromq, mysql, redis, fastdfs, mongodb, ZK, streaming media, CDN, P2P, k8s, docker, TCP / IP, coprocess, dpdk, ffmpeg, etc.) and share them free of charge

Atomic operation and lock free programming of C + + multithreading concurrency under Linux

2.3 using atomic type instead of mutex programming

To facilitate comparison, we modify the sample program directly based on the previous article: mutex of thread synchronization. The code of replacing mutex with atomic library is as follows:

//Atomic1.cpp uses atomic library instead of mutex library to realize thread synchronization
#include <chrono>#include <atomic>#include <thread>#include <iostream> std::chrono::milliseconds interval(100);
STD:: atomic < bool > readyflag (false); // atomic boolean type, replacing mutex
std::atomic<int> job_ Shared (0); // both threads can modify 'job'_ Shared ', the variable is specialized to an atomic type
int job_ Exclusive = 0; // only one thread can modify 'job'_ Exclusive ', no protection required
//Only 'job' can be modified by this thread_ shared'
void job_1(){       std::this_thread::sleep_for(5 * interval);
    std::cout << "job_1 shared (" << job_shared.load() << ")n"; (true); // change the boolean flag status to true
}//This thread can modify 'job'_ Shared 'and' job_ exclusive'
void job_ 2 () {while (true) {// infinite loop until 'job' can be accessed and modified_ shared'
        if ( readyFlag.load ()) {// judge whether the boolean flag status is true. If it is true, modify 'job'_ shared’
            std::cout << "job_2 shared (" << job_shared.load() << ")n";
        }If else {// boolean flag is false, modify 'job'_ exclusive'
            ++job_exclusive;            std::cout << "job_2 exclusive (" << job_exclusive << ")n";
            std::this_thread::sleep_for(interval);        }    }}int main() {    std::thread thread_1(job_1);    std::thread thread_2(job_2);    thread_1.join();    thread_2.join();    getchar();    return 0;

As can be seen from the example program, atomic boolean type can realize some functions of mutex, but when using condition variable, mutex is still needed to protect consumption of condition variable, even if condition variable is an atomic object.

2.4 using atomic type to implement spinlock

Spinlock is similar to mutex. There can be at most one holder at any time. However, if the resource has been occupied, the mutex will make the resource applicant sleep, while the spinlock will not cause the caller to sleep, and it will judge whether the lock has been successfully acquired. Spinlock is a kind of lock specially introduced to prevent multiprocessor concurrency. It is widely used in interrupt processing and other parts of the kernel (for a single processor, to prevent concurrency in interrupt processing, you can simply turn off the interrupt, that is, turn off / open the interrupt flag bit in the flag register, and do not need the spin lock).

For a multi-core processor, detecting the lock available and setting the lock state need to be implemented as an atomic operation. If it is divided into two atomic operations, a thread may be preempted by other threads before setting the lock after obtaining the lock, resulting in an execution error. This requires the atomic library to provide atomic operations for the atomic variable “read modify write”. The operations supported by atomic types above provide RMW (read modify write) atomic operations, such as (VAL) and a.compare_ exchange(expected,desired)。

The standard library also provides an atomic boolean type STD:: atomic_ Flag, different from all STD:: atomic specialization, is guaranteed to be lock free. It does not provide load() and store (VAL) operations, but provides test_ and_ Set() and clear() operations, where test_ and_ Set() is an atomic operation that supports RMW. STD:: atomic can be used_ Flag realizes the function of spin lock, and the code is as follows:

//Atomic2.cpp uses atomic boolean type to realize spinlock function
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
std::atomic_ flag lock = ATOMIC_ FLAG_ Init; // initialize atomic boolean type
void f(int n)
 for (int cnt = 0; cnt < 100; ++cnt) {
 while ( lock.test_ and_ set(std::memory_ order_ Acquire)) // get lock
 ; // spin
 std::cout << n << " thread Output: " << cnt << 'n';
 lock.clear (std::memory_ order_ Release); // release lock
int main()
 STD:: vector < STD:: thread > V; // instantiate a vector with element type STD:: thread
 for (int n = 0; n < 10; ++n) {
 v.emplace_ Back (F, n); // the element with parameter (F, n) as the initial value is placed at the end of the vector, which is equivalent to starting a new thread f (n)
 For (Auto & T: V) {// traverses the elements in the vector V, based on the range of the for loop, auto & automatically deduces the variable type and refers to the content pointed to by the pointer
 t. Join(); // block the main thread until the child thread finishes executing
 return 0;

Spin lock in addition to using atomic_ In addition to the TAS (test and set) atomic operation implementation of flag, the common atomic type STD:: atomic can also be used: (VAL) supports TAS atomic operation, and a.compare_ And expect (CAS) can be implemented by itself. Among them, CAS atomic operation is the main means to realize lock-free programming. We will continue to introduce lock-free programming.

3、 How to program without lock

3.1 what is lock free programming

Before atomic operations appear, the read and write of shared data may get uncertain results, so the lock mechanism should be used to protect the access process of shared data when multithreading concurrent programming. However, lock application release increases the consumption of accessing shared resources, and may cause thread blocking, lock contention, deadlock, priority inversion, and difficulty in debugging.

With the support of atomic operations, the read and write access to a single basic data type can be protected without locks. However, for complex data types such as linked lists, it is possible that multiple cores can add or delete nodes at the same location in the linked list, which will lead to operation failure or wrong sequence. Therefore, before we operate on a node, we need to judge whether the value of the node is consistent with the expected value. If it is consistent, the operation will be carried out, and if not, the expected value will be updated. These operations still need to be implemented as an RMW (read modify write) atomic operation, which is the CAS (compare and swap) atomic operation mentioned above, which is the most commonly used operation in lockless programming.

Since the purpose of lock free programming is to solve some problems caused by lock mechanism, it can be understood as programming that can ensure atomic variable synchronization between multithreads without using lock mechanism. The implementation of lock free is just a multi-threaded programming model which combines multiple instructions into one instruction to form a logic complete minimum unit, and is compatible with CPU instruction execution logic.

Lock free programming is based on atomic operation. Load() and store (VAL) can ensure the concurrent synchronization of shared access to basic atomic types, while more complex CAS is needed to ensure concurrent synchronization for shared access of abstract complex types. The concurrent access process only does not use locking mechanism, but it can be understood as locking behavior, with small granularity and higher performance. For a concurrent access process that cannot be implemented as an atomic operation, it still needs to be implemented by means of lock mechanism.

3.1 realization of lock free programming by CAS atomic operation

CAS atomic operation is mainly through the function a.compare_ Exchange (expected, desired) implementation, its semantics is “I think the value of V should be a, if it is, then update the value of V to B, otherwise, do not modify and tell the actual value of V”. The pseudo code of CAS algorithm is as follows:

bool compare_exchange_strong(T& expected, T desired) 
{     if( this->load() == expected ) { 
        return true; 
    } else {
        expected = this->load();
        return false; 
    } }

The following is an attempt to implement a lock free stack. The code is as follows:

//Atomic3.cpp implements a lock free stack using CAS operation
#include <atomic>#include <iostream>template<typename T>class lock_free_stack{private:    struct node    {        T data;        node* next;
        node(const T& data) : data(data), next(nullptr) {}
    };    std::atomic<node*> head; public:    lock_free_stack(): head(nullptr) {}    void push(const T& data)    {        node* new_node = new node(data);        do{
            new_ node->next =  head.load (); // put the current value of head in New_ node->next
        }while(!head.compare_exchange_strong(new_node->next, new_node));
        //If the new element is new_ Node's next is the same as the top of the stack head, proving that no one has operated on it before you, just replace the top of the stack with a new element and exit;
        //If it is different, prove that someone has operated it before you and the top of the stack has changed. This function will automatically update the next value of the new element to the changed top of the stack;
        //Then continue to cycle detection until state 1 is established and exit;
    }    T pop()    {        node* node;        do{
            node = head.load();
        }while (node && !head.compare_exchange_strong(node, node->next));
            return node->data;
    }}; int main(){    lock_free_stack<int> s;    s.push(1);
    std::cout << s.pop() << std::endl;    std::cout << s.pop() << std::endl;        getchar();    return 0;

It has been explained clearly in the program comments. Before pushing the data onto the stack, we first compare whether the atomic type head is equal to the next pointing object of the new element to determine whether the head has been modified by other threads. According to the judgment results, we can choose whether to continue the operation or update the expectation. All these are completed in an atomic operation, which ensures that the common operation can be realized without using locks Enjoy concurrent synchronization of data.

CAS seems to be very powerful, but it also has shortcomings. The most famous one is the ABA problem. Suppose a variable a is changed to B and then changed to a, the mechanism of CAS is imperceptible, but it has been modified. If there is no problem with basic types, but what about reference types? There are multiple variables in this object. How do I know if they have been changed? Smart, you must have thought of adding a version number. Check the version number every time you modify it. If the version number changes, it means it has been changed. Even if you are still a, it can’t work.

In the above example, the node pointer also belongs to the reference type. Naturally, there is an ABA problem. For example, when a pop operation is performed in thread 2, a and, Delete all B, and then create a new element push in, because the memory allocation mechanism of the operating system will reuse the memory released before. The memory address in the push is exactly the same as that of a, and we mark it as a ‘. At this time, switch to thread 1, and CAS operation checks that a has not changed, and sets B as the top of the stack, but B is a memory block that has been released. The solution to this problem is to label a and a ‘as different pointers, and the specific implementation code readers can try to implement