Detailed explanation of Java CAS underlying implementation principle


This article mainly introduces the Java CAS underlying implementation principle example, the article introduces in detail through the example code, which has a certain reference learning value for everyone’s study or work, and friends in need can refer to it

1、 The concept of CAS (compare and swap)

CAS, full name of compare and swap (compare and swap), is a mechanism to solve the performance loss caused by lock in multi thread parallel situation.

CAS (V, a, b), V is the memory address, a is the expected original value, and B is the new value. If the value of the memory address matches the expected original value, the location value is updated to the new value. Otherwise, it indicates that it has been updated by other threads, and the processor does nothing; in either case, it will return the value of the location before the CAS instruction. We can use spinlocks, loop CAS, reread the variable and try to modify it again, or we can abort the operation.

2、 The generation of CAS (compare and swap)

Why is CAS needed? Let’s start with a mistake. We often use volatile keyword to modify a variable to show that it is a globally shared variable with visibility and orderliness. But it’s not atomic. For example, a common operation a + +. This operation can be subdivided into three steps:

(1) Read a from memory

(2) Add 1 to a

(3) Write the value of a back to memory

In a single threaded state, there is no problem with this operation, but in multithreading there will be a variety of problems. Since the old thread a has not read the memory value of 1, it may not be able to read it. It causes unsafe phenomenon of thread.

Volatile keyword can ensure the visibility and orderliness of shared variables between threads, and can prevent CPU instruction reordering (DCL singleton), but it can’t guarantee the atomicity of operation. Therefore, CAS was introduced after JDK1.5, and CPU primitive was used to ensure the courtyard of thread operation.

CAS operation is supported by the processor and is a primitive. Primitive language is the category of operating system or computer network. It is a process composed of several instructions, which is used to complete a certain function. It has indivisibility, that is, the execution of the primitive must be continuous and can not be interrupted in the process of execution. Such as Intel processor, compare and exchange through the cmpxchg series of instructions.

3、 The principle of CAS (compare and swap)

CAS is mainly implemented in the atomic package of JUC. We take atomicinteger class as an example

Through code tracing, we can see that CAS operations in Java are implemented through the unsafe class under sun package, and the methods in unsafe class are all native methods, which are implemented locally by the JVM. Therefore, the final implementation is based on C and C + + operating on the operating system

Unsafe class, in sun.misc Package, does not belong to the Java standard. Unsafe class provides a series of operations to increase the ability of Java language, such as memory management, operation class / object / variable, multi thread synchronization, etc

//VAR1 is the object of CAS operation, offset is the address offset value of a certain attribute of VAR1, expected is the expected value, and var2 is the value to be set. JNI is used to complete the operation of CPU instructions
public final native boolean compareAndSwapObject(Object var1, long var2, Object var4, Object var5);
public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);
public final native boolean compareAndSwapLong(Object var1, long var2, long var4, long var6);
public native Object getObjectVolatile(Object var1, long var2);
public native void putObjectVolatile(Object var1, long var2, Object var4);
The implementation of unsafety in hotspot source code\ unsafe.cpp
UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
 OOP P = jnihandles:: Resolve (obj); calculates the address of value based on the offset. The offset here is the valueoffset in atomaicinteger
 jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
 return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END\hotspot\src\share\vm\runtime\atomic.cppunsigned Atomic::cmpxchg(unsigned int exchange_value,
             volatile unsigned int* dest, unsigned int compare_value) {
 assert(sizeof(unsigned int) == sizeof(jint), "more work to do");
 return (unsigned int)Atomic::cmpxchg((jint)exchange_value, (volatile jint*)dest,
}Depending on the type of operating system, the compiler will decide which platform to call the overloaded function during precompiling

You can see that the “atomic:: cmpxchg” method is called, and the “atomic:: cmpxchg” method is in Linux_ X86 and windows_ The implementation of X86 is as follows

linux_ The underlying implementation of x86_ cpu\linux_ x86\vm\atomic_ linux_ x86. inline.hpp
inline jint   Atomic::cmpxchg  (jint   exchange_value, volatile jint*   dest, jint   compare_value) {
 int mp = os::is_MP();
 __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
          : "=a" (exchange_value)
          : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
          : "cc", "memory");
 return exchange_value;
windows_ The underlying implementation of x86
inline jint   Atomic::cmpxchg  (jint   exchange_value, volatile jint*   dest, jint   compare_value) {
 // alternative for InterlockedCompareExchange
 int mp = os::is_MP();
 __asm {
  mov edx, dest
  mov ecx, exchange_value
  mov eax, compare_value
  cmpxchg dword ptr [edx], ecx

Summary: according to the data query, in fact, the underlying implementation of CAS will have different overloads according to different operating systems, and the implementation of CAS cannot do without the support of processor.

The core code is a cmpxchg instruction with lock prefix, namely lock cmpxchg DWORD PTR [EDX], ECX

Atomic:: cmpxchg method analysis:

MP is “OS:: is”_ Mp() “returns the result,” OS:: is “_ MP () “is an inline function used to determine whether the current system is multiprocessor.

If the current system is multiprocessor, this function returns 1.

Otherwise, it returns 0.

LOCK_ IF_ MP (MP) determines whether to add lock prefix to cmpxchg instruction based on MP value.

If MP determines that the current system is multiprocessor (that is, MP value is 1), then add lock prefix to cmpxchg instruction.

Otherwise, do not prefix lock.

This is an optimization method. It is considered that it is not necessary to add lock prefix in single processor environment. Lock prefix will be added only in multi-core environment, because lock will lead to performance degradation. Cmpxchg is an assembly instruction that compares and exchanges operands.

4、 Advantages and disadvantages of CAS mechanism

4.1 advantages

CAS is an optimistic lock and a non blocking lightweight optimistic lock. What is non blocking? In fact, if a thread wants to obtain a lock, the other party will give a response indicating whether the lock can be obtained. Compared with synchronized weight lock, synchronized performs more complex locking, unlocking and wake-up operations.

4.2 disadvantages

1) The cycle time is long, the overhead is high, and it takes up CPU resources

2) Only one atomic operation of shared variables can be guaranteed

3) ABA problem

4.3 solving ABA problems

1) Add version number


In order to solve this problem, a marked atomic reference class “atomicstampedreference” is provided. It can control the version of variable value to ensure the correctness of CAS. Therefore, before using CAS, it is necessary to consider whether the “ABA” problem will affect the correctness of program concurrency. If the ABA problem needs to be solved, the traditional mutual exclusion synchronization may be more efficient than atomic classes.

5、 When to use CAS

5.1 with fewer threads and short waiting time, spin lock can be used for CAS to try to get lock, which is more efficient than synchronized

5.2 due to the large number of threads and long waiting time, it is not recommended to use spinlocks, which takes up a lot of CPU

The above is the whole content of this article, I hope to help you in your study, and I hope you can support developeppaer more.

Recommended Today

How to share queues with hypertools 2.5

Share queue with swote To realize asynchronous IO between processes, the general idea is to use redis queue. Based on the development of swote, the queue can also be realized through high-performance shared memory table. Copy the code from the HTTP tutorial on swoole’s official website, and configure four worker processes to simulate multiple producers […]