Concurrent programming — a better multithread counting class than atomicinteger: analysis of longadder principle

Time:2021-5-2

preface

Recently, when learning the source code of concurrent HashMap, we found that it uses a relatively unique way to count the number of elements in the map. Naturally, we need to study its principle and ideas, and at the same time, we can better understand concurrent HashMap itself.

The main idea of this paper is divided into the following five parts

1. The effect of counting

2. Visual illustration of principle

3. Detailed analysis of source code

4. Comparison with atomicinteger

5. Abstraction of thought

The entrance to learning is naturally the put method of map

public V put(K key, V value) {
    return putVal(key, value, false);
}

View the putval method

We don’t discuss much about the principle of concurrent HashMap itself, so let’s skip to the counting section

final V putVal(K key, V value, boolean onlyIfAbsent) {
    ...
    addCount(1L, binCount);
    return null;
}

Every time an element is successfully added, addcount method will be called to accumulate the number by 1, which is our research goal

Because the original intention of concurrent HashMap design is to solve the map operation in the multithreading concurrent scenario, it is natural to consider thread safety when doing numerical accumulation

Of course, multithreading numerical accumulation is generally the first lesson in learning concurrent programming, which is not very complicated. We can use atomic integer or lock to solve this problem

However, if we look at this method, we will find that the logic of an accumulation method, which is supposed to be relatively simple, seems rather complicated

Here I only post the core part of the accumulation algorithm

private final void addCount(long x, int check) {
    CounterCell[] as; long b, s;
    if ((as = counterCells) != null ||
            !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
        CounterCell a; long v; int m;
        boolean uncontended = true;
        if (as == null || (m = as.length - 1) < 0 ||
                (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
                !(uncontended =
                        U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
            fullAddCount(x, uncontended);
            return;
        }
        if (check <= 1)
            return;
        s = sumCount();
    }
    ...
}

Let’s study the implementation of the logic. This idea is actually a copy of the logic of the longadder class, so let’s look directly at the original class of the algorithm

1. Use of longadder class

Let’s take a look at the effect of longadder

LongAdder adder = new LongAdder();
int num = 0;

@Test
public void test5() throws InterruptedException {
    Thread[] threads = new Thread[10];
    for (int i = 0; i < 10; i++) {
        threads[i] = new Thread(() -> {
            for (int j = 0; j < 10000; j++) {
                adder.add(1);
                num += 1;
            }
        });
        threads[i].start();
    }
    for (int i = 0; i < 10; i++) {
        threads[i].join();
    }
    System.out.println("adder:" + adder);
    System.out.println("num:" + num);
}

Output results

adder:100000
num:40982

We can see that adder can ensure the accumulated thread safety in the use effect

2. Intuitive understanding of longadder principle

In order to analyze the source code better, we need to understand its principle intuitively, otherwise we will be confused if we look at the code directly

The count of longadder is mainly divided into two objects

A long type field: Base

An array of cell objects, in which a long type field value is maintained for counting

/**
 * Table of cells. When non-null, size is a power of 2.
 */
transient volatile Cell[] cells;

/**
 * Base value, used mainly when there is no contention, but also as
 * a fallback during table initialization races. Updated via CAS.
 */
transient volatile long base;

1

When there is no thread competition, the accumulation will occur in the base field, which is equivalent to a single thread accumulation twice, but the accumulation of base is a CAS operation

1

When thread competition occurs, there must be a thread’s CAS accumulation operation to base fails, so it first judges whether the cell has been initialized, if not, initializes an array with length of 2, finds the corresponding array index according to the hash value of the thread, and accumulates the value value in the cell object of the index (this accumulation is also the operation of CAS)

1

If there are three threads competing, the first thread will accumulate the CAS of the base successfully, and the remaining two threads will need to accumulate the elements in the cell array. Because the accumulation of value in cell is also a CAS operation, if the array subscript corresponding to the hash value of the second thread and the third thread is the same, then competition will also occur. If the second thread succeeds, the third thread will rehash its own hash value. If the new hash value corresponds to the array subscript corresponding to another null element, Then create a new cell object and accumulate the value

1

If thread 4 participates in the competition at the same time, then for thread 4, even after rehash, CAS may fail in the process of competition with thread 3. At this time, if the capacity of the current array is less than the number of CPUs available to the system, it will expand the array, and then rehash again to repeatedly try to accumulate a subscript object in the cell array

1

The above is the overall intuitive understanding, but there are still many details in the code that are worth learning, so we began to enter the part of source code analysis

3. Source code analysis

The entry method is add

public void add(long x) {
    Cell[] as; long b, v; int m; Cell a;
    /**
     *Here, priority is given to determine whether the cell array is empty, and then the CAS accumulation of the base field is determined
     *This means that if the thread does not compete and the cell array is always empty, then all the accumulation operations will be accumulated to the base
     *Once the cell array is not empty due to a competition, all the accumulation operations will take precedence over the objects in the array
     */
    if ((as = cells) != null || !casBase(b = base, b + x)) {
        /**
         *This field is used to identify whether contention occurs when accumulating the objects in the cell array
         *If there is a race, there will be one more rehash spin in the longaccumulate method
         *This is explained in detail in the later method. Here is an impression
         *True indicates that there is no competition
         */
        boolean uncontended = true;
        /**
         *If the cell array is empty or the length is 0, the main logic method will be entered directly
         */
        if (as == null || (m = as.length - 1) < 0 ||
                /**
                 *The getprobe () method here can be regarded as getting the hash value of the thread
                 *Hash value and (array length - 1) to get the corresponding array subscript after bit and operation
                 *Judge whether the element is empty, if not, try to accumulate
                 *Otherwise, enter the main logic method
                 */
                (a = as[getProbe() & m]) == null ||
                /**
                 *CAS accumulation is performed on the subscript elements of the array. If it is successful, it can be returned directly
                 *Otherwise, enter the main logic method
                 */
                !(uncontended = a.cas(v = a.value, v + x)))
            longAccumulate(x, null, uncontended);
    }
}

When there is no thread contention, casbase in the first if will be responsible for the accumulation operation, corresponding to the case I illustrated earlier

When thread contention occurs, the cell array will be responsible for the accumulation operation, corresponding to the case 2 illustrated in the previous figure (the initialization of the array is in the longaccumulate method)

Next, let’s look at the main logic method. Because the method is relatively long, I will parse it paragraph by paragraph

Longaccumulate method

Parameters in signature

xRepresents the value to be accumulated

fnIt indicates how to accumulate. Generally, it is not important to pass null

wasUncontendedIndicates whether the outer layer method has encountered a competitive failure, because the outer layer judgment logic is multiple or(as == null || (m = as.length – 1) < 0 || (a = as[getProbe() & m]) == null)So if the array is empty or the corresponding subscript element has not been initialized, this field will remain false

final void longAccumulate(long x, LongBinaryOperator fn,
                          boolean wasUncontended) {
  ...
}

First, determine whether the hash value of the thread is 0. If it is 0, an initialization, namely rehash, is needed

Later, wasuncontended will be set to true, because even if it has been conflicting before, after rehash, it will assume that it can find an array subscript whose elements do not conflict

int h;// The hash value of thread will be used in later logic
if ((h = getProbe()) == 0) {
    ThreadLocalRandom.current(); // force initialization
    h = getProbe();
    wasUncontended = true;
}

Then there is a dead loop, in which there are three big if branches,The logic of these three branches works when the array is uninitializedOnce the initialization of the array is completed, it will all enter the main logic. Therefore, I will extract the main logic and put it in the future to avoid the influence of the outer branches on the thinking

/**
 *Used to mark whether the array subscript found by a thread in the last loop already has a cell object
 *If true, the array subscript is empty
 *It is used in the loop of the main logic
 */
boolean collide = false;
/**
 *Dead loop provides spin operation
 */
for (; ; ) {
    Cell[] as;
    Cell a;
    int n;// Cell array length
    long v;// The value that needs to be accumulated
    /**
     *If the cells array is not empty and has been successfully initialized by a thread, it will enter the main logic, which will be explained in detail later
     */
    if ((as = cells) != null && (n = as.length) > 0) {
        ...
        /**
         *If the array is empty, you need to initialize a cell array
         *Cellsbusy is used to mark whether the cells array can be operated, which is equivalent to a lock
         *Cells = = as determines whether other threads have initialized an array before the current thread enters the judgment
         *Cascellsbusy uses a CAS operation to assign a value of 1 to the cellsbusy field. If it succeeds, it can be considered that it has obtained the lock of the operation cell array
         */
    } else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
        /**
         *Here is to initialize an array, I won't explain
         */
        boolean init = false;
        try {                           
            if (cells == as) {
                Cell[] rs = new Cell[2];
                rs[h & 1] = new Cell(x);
                cells = rs;
                init = true;
            }
        } finally {
            cellsBusy = 0;
        }
        if (init)
            break;
        /**
         *If the current array is empty and there are no competing threads
         *Then try again to assign a value to base
         *If you haven't competed (feel a little pitiful), then spin
         *In addition, the longbinaryoperator object in the method signature is used here, which does not affect the logic
         */
    } else if (casBase(v = base, ((fn == null) ? v + x :
            fn.applyAsLong(v, x))))
        break;                          // Fall back on using base
}

Next, let’s look at the main logic of cell array elements accumulation

/**
 *If the cells array is not empty and has been successfully initialized by a thread, enter the main logic
 */
if ((as = cells) != null && (n = as.length) > 0) {
    /**
     *If the hash value of the current thread corresponds to an empty array element
     */
    if ((a = as[(n - 1) & h]) == null) {
        /**
         *The cell array is not manipulated by other threads
         */
        if (cellsBusy == 0) {
            /**
             *It is not understood why the author initializes a single cell here
             *The author's note here is optimally create. If you have any understanding students, you can say it
             */
            Cell r = new Cell(x);
            /**
             *Judge the state of the cell lock here and try to lock it
             */
            if (cellsBusy == 0 && casCellsBusy()) {
                boolean created = false;
                try {
                    /**
                     *Here, check whether the array is empty again
                     *If the verification is passed, the previous new cell object will be placed at the subscript of the cell array
                     */
                    Cell[] rs;
                    int m, j;
                    if ((rs = cells) != null &&
                            (m = rs.length) > 0 &&
                            rs[j = (m - 1) & h] == null) {
                        rs[j] = r;
                        created = true;
                    }
                } finally {
                    cellsBusy = 0;
                }
                /**
                 *If the creation is successful, it means that the accumulation is successful and the loop will exit directly
                 */
                if (created)
                    break;
                /**
                 *It means that there are other threads creating a cell in the subscript between judging null and getting lock
                 *Therefore, if you continue without rehash, you will not enter the branch next time
                 */
                continue;
            }
        }
        /**
         *When it is executed here, it is in the judgment logic of if ((a = as [(n - 1) & H]) = = null)
         *It means that there is no element in the subscript when the first if is judged, so the assignment is false
         *The meaning of "collapse" is: does the array subscript found in the last loop already have a cell object
         * True if last slot nonempty
         */
        collide = false;
    /**
     *If this field is false, it means that it has competed with other threads before
     *Even if CAS operation can be directly attempted at this time, but in the high concurrency scenario
     *After these two threads, competition may still occur, and if spinning is required for each competition, CPU resources will be wasted
     *So here we increase the spin directly once, and we do a rehash at the end of for
     *Make the thread find its own exclusive array subscript as soon as possible
     */
    } else if (!wasUncontended) 
        wasUncontended = true;
    /**
     *Try to accumulate the cell corresponding to hash. If this step is successful, return
     *If this step still fails, the overall concurrency competition is very fierce
     *Then you may need to consider expanding the array
     *(because the initialization capacity of the array is 2, if there are 10 threads running concurrently at this time, it is difficult to avoid competition.)
     */
    else if (a.cas(v = a.value, ((fn == null) ? v + x :
            fn.applyAsLong(v, x))))
        break;
    /**
     *The number of CPU cores is determined here, because even if there are 100 threads
     *The number of threads that can run in parallel at the same time is equal to the number of CPUs
     *Therefore, if the length of the array is larger than the number of CPUs, it should not be expanded
     */
    else if (n >= NCPU || cells != as)
        collide = false;
    /**
     *Here, it shows that the array subscript found in the current loop according to the thread hash value already has elements
     *If the collapse is false at this time, it means that there is no element under the last loop
     *Then spin once and rehash
     *If it runs here again, and the collapse is true, it means that the competition is very fierce and it should be expanded
     */
    else if (!collide)
        collide = true;
    /**
     *Can run here, the need to expand the array
     *Determine the lock status and try to acquire the lock
     */
    else if (cellsBusy == 0 && casCellsBusy()) {
        /**
         *Expansion array logic, this expansion is relatively simple, will not explain
         *The expansion size is 2 times
         */
        try {
            if (cells == as) { 
                Cell[] rs = new Cell[n << 1];
                for (int i = 0; i < n; ++i)
                    rs[i] = as[i];
                cells = rs;
            }
        } finally {
            cellsBusy = 0;
        }
        collide = false;
        /**
        *Here, continue directly. Because the capacity has been expanded, rehash is not needed
        */
        continue;               
    }
    /**
     *Make a rehash so that the thread may find an exclusive array subscript in the next loop
     */
    h = advanceProbe(h);
}

Here, the source code analysis of longadder is over. In fact, there are not many codes, but his ideas are very worthy of our study.

4. Comparison with atomicinteger

In fact, it’s a little bit worse to analyze the source code. We still don’t understand why the author wants to design such a very complex class with atomicinteger.

First of all, let’s analyze the principle of atomicinteger to ensure thread safety

Look at the most basic getandincrement method

public final int getAndIncrement() {
    return unsafe.getAndAddInt(this, valueOffset, 1);
}

Call the getandaddint method of the unsafe class, and continue to look

public final int getAndAddInt(Object var1, long var2, int var4) {
    int var5;
    do {
        var5 = this.getIntVolatile(var1, var2);
    } while(!this.compareAndSwapInt(var1, var2, var5, var5 + var4));

    return var5;
}

Here we will not go into the concrete implementation of getintvolatile and compareandswapint methods, because they are already native methods

It can be seen that the underlying layer of atomicinteger uses CAS + spin to solve the atomicity problem, that is, if one assignment is unsuccessful, then spin until the assignment is successful

So it can be inferred that when a large number of threads are concurrent and the competition is very fierce, atomicinteger may cause some threads to constantly fail in competition and spin, thus affecting the throughput of tasks

In order to solve the problem of spin in high concurrency, the author of longadder added an array to make the competing objects change from one value to multiple values, so as to reduce the frequency of competition, so as to alleviate the problem of spin. Of course, the cost is extra storage space.

Finally, I did a simple test to compare the time-consuming of the two counting methods

According to the principle, only when the thread competition is very fierce, the advantage of longadder will be more obvious. Therefore, I used 100 threads here, and each thread accumulated 1000000 times for the same number. The result is as follows, the gap is very huge, up to 15 times!

Longadder time: 104292242nanos

Atomicinteger: 1583294474nanos

Of course, this is just a simple test, which contains a lot of randomness. Students who are interested can try different levels of competition for many times

5. Abstraction of thought

Finally, we need to abstract the author’s specific code and implementation logic to clarify the thinking process

1) The problem of atomicinteger: single resource competition leads to spin

2) The solution: expand the competition of a single object to the competition of multiple objects (there are some ideas of divide and conquer)

3) Controllability of expansion: multiple competing objects need to pay extra storage space, so they can’t expand mindlessly (in extreme cases, one thread and one count object, which is obviously unreasonable)

4) Layering: because the scenario of using classes is uncontrollable, it is necessary to dynamically expand additional storage space according to the intensity of concurrency (similar to the expansion of synchronized)

5) Three layered strategies: when there is no competition, a value can be accumulated; When a certain degree of competition occurs, an array with a capacity of 2 is created to expand the number of competing resources to 3; When the competition becomes more intense, continue to expand the array (corresponding to the process of 1 thread to 4 threads in the diagram)

6) Strategy details: add rehash when spinning. Although it takes a certain amount of computing time to calculate hash and compare array objects, it will enable concurrent threads to find their own objects as soon as possible, and there will be no competition after that

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]