Starting from 10 questions, I will show you all aspects of the JVM

Time:2020-9-29

Every java development student will encounter JDK, JVM and GC problems in daily work or interview. This article will take the following 10 questions as the breakthrough point, and take you to have a comprehensive understanding of all aspects of the JVM.

  1. The difference and relation of JVM, JRE and JDK
  2. What is a JVM? And its main role
  3. What are the core functions of the JVM
  4. Class loading mechanism and process
  5. Logical structure of runtime data area
  6. Memory model of JVM
  7. How to determine whether an object is garbage
  8. What are the garbage collection algorithms
  9. Various garbage collectors
  10. Parameter configuration for JVM tuning

At the end of the last article, we talked about the design specifications of the JVM. From the perspective of usage, the memory of the JVM is generally divided into thread private memory area and thread shared memory area.

Starting from 10 questions, I will show you all aspects of the JVM

Thread private memory area determines the required space of “program counter” and “virtual stack frame” when the class loader compiles a class file. It will be generated with the generation of the current execution thread and die out of the execution thread. Therefore, the “thread private memory area” does not need to consider the problems of memory management and garbage collection. The thread shared memory area is created when the virtual machine is started and shared by all threads. It is the most concerned and largest piece of memory managed by Java virtual machine. First of all, let’s take a look at the memory model of thread shared memory area?

6. Memory model of JVM

Starting from 10 questions, I will show you all aspects of the JVM

As shown in the figure, the memory structure of the JVM is divided into heap and non heap areas.

  • The “non heap” is the method area or metadata area mentioned in the previous article, which is used to store class information.
  • The “heap” is used to store instance objects or arrays created during the execution of each thread of the JVM. The heap is divided into two large blocks, one is old area and the other is young area. Young can be divided into two parts, one is the survivor area (S0 + S1), the other is Eden area, S0 and S1 are the same size, also can be called from and to.

The reason for this division, the designer’s purpose is nothing more than memory management, which is what we call garbage collection. So what kind of object is garbage? What are the garbage collection algorithms? What are the commonly used garbage collectors? In this article, we will clarify these problems and knowledge points.

7. How to determine whether an object is garbage?

To do garbage collection, you need to know what kind of object is garbage. At present, there are two algorithms to confirm whether the object is garbage: reference counting method and reachability analysis method.

  • 1. Reference counting method: a reference counter is added to the object. When there is a place to reference this object, the value of the reference counter will be increased by 1. When the reference fails, the value of the reference counter will be reduced by 1. When the value of the reference counter is 0, the JVM begins to recycle the object.

For an object, as long as the application holds a reference to the object, it means that the object is not garbage. If an object does not have any reference to it, it is garbage. Although this method is very simple and efficient, the JVM generally does not choose this method, because there is a disadvantage in this method: when the objects point to each other, the reference counter value of the two objects will be increased by 1. Because the two objects point to each other, the reference will not be invalid, so the JVM cannot recycle.

  • 2. Reachability analysis: Aiming at the disadvantages of the reference counting algorithm, the JVM adopts another algorithm, which uses some “GC roots” objects as the starting point to search downward. The path of the search is called reference chain. When an object is not connected with any reference chain to GC roots, it is proved that the object is not available, that is, garbage collection can be carried out. Otherwise, prove that the object is useful, not garbage.

    Starting from 10 questions, I will show you all aspects of the JVM

Although obj7 and ob88 in the figure above refer to each other, they are not reachable from GC roots, so they will be marked as garbage. The JVM uses the following types of objects as GC roots:

  • (1) Objects referenced in virtual machine stack (local variable table in stack frame);
  • (2) The object referenced by the static attribute of the class in the method area;
  • (3) The object referenced by the constant in the method area;
  • (4) The object referenced by JNI (native method) in the local method stack.

Note: in the reachability analysis algorithm, the unreachable objects are not directly recycled. At this time, they are in probation state, and need to be marked at least twice to determine whether the object is recycled

Mark for the first time: if an object is found to have no reference chain connected to GC roots after reachability analysis, it will be marked for the first time;

Second tag: after the first tag, a filter is performed. The filter is based on whether it is necessary for the object to execute the finalize() method (which links the object to GC roots). In the finalize() method, those that do not re-establish association with the reference chain will be marked for the second time.

The object marked successfully for the second time will be recycled. If the object is re associated with the reference chain in the finalize() method, it will escape this recycle and continue to survive.

8. What are the garbage collection algorithms

After knowing how the JVM determines which objects are garbage, let’s take a look at the garbage collection algorithms of the JVM.

1. Mark sweep algorithm

  • The first step is to “mark”, as shown in the figure below, scan all the objects in the heap, find out which objects need to be recycled, and mark them.

    Starting from 10 questions, I will show you all aspects of the JVM

  • The second step is to clear the objects marked as “unreference object” in the first step to free up memory space.
    Starting from 10 questions, I will show you all aspects of the JVM

This algorithm has two disadvantages

(1) Marking and clearing are both time-consuming and inefficient

(2) After clearing, a large amount of discontinuous memory fragment space will be generated. If the program needs to create a large object, it will not find enough continuous memory space and have to trigger garbage collection again.

2. Mark copying algorithm

The memory is divided into two areas, one of which is used each time. When one of the blocks is full and garbage collection is triggered, the surviving objects are copied to the other, and then the previously used block is formatted and cleaned at one time.
Starting from 10 questions, I will show you all aspects of the JVM
(before clearing)
Starting from 10 questions, I will show you all aspects of the JVM
(after clearing)

The disadvantage of mark copy algorithm is obvious, which is the low utilization of memory space.

3. Mark compact algorithm

Mark collation algorithm the marking process is still the same as “mark clear” algorithm, but the subsequent step is not to clean up the recyclable objects directly, but to move all the surviving objects to one end, and then clean up the memory outside the end boundary directly.
Starting from 10 questions, I will show you all aspects of the JVM
All surviving objects will be moved out of the memory boundary.
Starting from 10 questions, I will show you all aspects of the JVM
Combining these three algorithms, we can see that,

  • The advantage of mark copy algorithm is high recovery efficiency, but there is a certain waste of space utilization.
  • However, the efficiency of “mark and tidy” algorithm is relatively low due to a series of operations such as moving to one side, but it is excellent in memory space management.
  • Therefore, mark copy algorithm is suitable for memory objects with short life cycle and high recycling frequency,
  • The “mark and tidy” algorithm is suitable for those scenarios with long life cycle and low recycling frequency, but pays attention to reclaiming once memory space is released enough.

Therefore, the designer of the JVM divides the heap memory of the JVM into two large areas, young area and old area. The young area stores the objects that have a short life cycle and will not be used once or twice. After recycling once, basically eight out of ten objects in this area are collected and cleaned up. Therefore, the garbage collection algorithm adopted by young area is also called “mark copy” algorithm. The old area stores the objects that have a long life cycle and are still alive after many times of recycling. They are put into the old area. At ordinary times, the reachability of these objects is not judged until the old area is not enough. Then, a unified recycling is conducted to release enough continuous memory space.

9. Various garbage collectors

In view of the need for different garbage collection algorithms for young and old areas, there are different garbage collection mechanisms for young and old areas in each era of the evolution of the whole garbage collector of the JVM. From jdk1.3 to now, the evolution of JVM garbage collector can be divided into four times: serial era, parallel era, concurrent era and G1 era.

Starting from 10 questions, I will show you all aspects of the JVM

1. Serial era: serial (young District) + serial old (old district)

In jdk3 (1.3), it was about 2000. At that time, the basic computers were all single core and one CPU. Therefore, the original design and implementation of garbage collection was based on single core and single thread. Moreover, the execution of the garbage collection thread is STW (stop the world) relative to the normal business thread execution. When a CPU or a collection thread is used to complete the garbage collection, other threads need to stop when the thread is executing.

Starting from 10 questions, I will show you all aspects of the JVM

The serial collector uses a single thread stop the world way to collect. When the memory is insufficient, the serial GC sets the pause mark. When all threads enter the safe point, the application thread pauses and the serial GC starts to work. The single thread method is used to reclaim space and tidy up the memory. Single thread also means less complexity and less memory consumption, but it also means that the advantages of multi-core can not be effectively utilized. Therefore, the serial collector is especially suitable for low heap memory, single core or even dual core CPU.

2. Parallel era: parallel scavenge (young District) + parallel old (old district)

Parallel collector is a garbage collector that focuses on throughput and is also the default collector configuration in server mode. The focus on throughput is mainly reflected in the young generation Parallel scavenge collector.

Starting from 10 questions, I will show you all aspects of the JVM

The working mode of parallel collector is similar to that of serial collector, both of which are stop the world mode, only garbage collection is performed in parallel when it is suspended. The younger generation uses the replication algorithm, while the older generation uses the mark collation method. It also compresses the memory while recycling. Focusing on throughput mainly refers to the parallel scavenge collector of the younger generation. Through two target parameters – XX: maxgcpausemiles and – XX: gctimeratio, the space size of the new generation is adjusted to reduce the frequency of GC triggering. The parallel collector is suitable for scenarios where the throughput requirement is much higher than the delay requirement, and the parallel collector will provide the best throughput when the worst delay is met.

3. Concurrent era: CMS (old district)

Concurrent mark clearing (CMS) is an excellent garbage collection algorithm aiming at focusing on latency. CMS is a garbage collection implementation for old area.

Starting from 10 questions, I will show you all aspects of the JVM

In the old days, CMS had to go through initial marking, concurrent marking, relabeling and concurrent clearing. Among them, the initial tag marks all the root objects in the form of STW; the concurrent tag marks the reachable path of the root object in parallel with the application thread; before garbage collection, CMS uses a STW to mark the reachable objects that may be missed due to the modification of mutator thread (the thread guiding the data change, i.e. the application thread); the final result cannot be obtained The object will be recycled in the concurrent cleanup phase. It is worth noting that both the initial tagging and relabeling are optimized for multithreading execution. CMS is very suitable for server-side applications with large heap memory and many CPU cores. It is also the preferred collector for large-scale applications before the advent of G1.

-However, CMS has two defects

  • (1) Since it is a mark clean, not a mark clean, memory fragmentation will occur, and the old area will eventually be exhausted or unable to allocate large objects over time. Finally, we have to do a full GC through the underlying guarantee mechanism (there is serial recycling behind CMS) and compress the memory.
  • (2) Since both marking and clearing are performed simultaneously by application threads, when two types of threads are executed at the same time, the heap memory will be occupied. Once the memory is insufficient at a certain time, the underlying guarantee mechanism will be triggered, and a STW garbage collection will be carried out by using serial collection.

4. G1 era: garbage first

In the era of G1 collector, the memory layout of Java heap is very different from that of other collectors. It divides the whole Java heap into several independent regions of equal size. Although the concept of new generation and old generation is still retained, the new generation and the old generation are no longer physically separated. They are all part of the collection of regions (which do not need to be continuous).

Starting from 10 questions, I will show you all aspects of the JVM

As shown in the figure above, each region has the same size, ranging from 1 to 32m, but it must be an exponent of 2. Set the region size through the following parameters: – XX: g1heapregionsize = M.
The principle or characteristics of G1 collector are as follows:

(1) Memory logic still retains the concept of generation. Each region is marked as new generation, old generation or idle at the same time;

(2) As a whole, the “mark and tidy algorithm” is adopted to avoid memory fragmentation

(3) For predictable pause, the strategy adopted by G1 as a whole is “screening recycling”, that is, before recycling, the recycling value and cost of each region to be recovered will be sorted. According to the expected recovery time of G1 configuration, the top several regions will be selected for recycling.

Starting from 10 questions, I will show you all aspects of the JVM

In fact, G1 (garbage first) is called because it gives priority to the region partition with more garbage collection.
The overall G1 garbage collection steps are divided into: initial tag, concurrent tag, final tag and filter recycle.

5、ZGC:Zero GC

This article briefly mentions the newly launched garbage collector, which is called “zero GC” because it pursues a lower GC pause time. The goal is to support TB level heap memory (up to 4T) and maximum GC pause of 10ms. The newly introduced ZGC collector in jdk11, whether physically or logically, does not exist the concept of old and new era in ZGC. It will be divided into pages. When performing GC operations, the pages will be compressed, so there is no fragmentation problem. Because it is jdk11 and can only be used on 64 bit Linux, it is still less used at present.

epilogue

The above two articles, 7000 words in total, are my systematic understanding of the role of the JVM, the design framework and the overall memory management of the JVM. thank.

Ten questions to find out about JVM & GC (1)

Author: Tan Wentao, Yixin Institute of Technology