Detailed explanation of Java garbage collection mechanism and classic garbage collector

Time:2021-9-21

Method for judging object survival

Reference counting method: add a reference counter in the object. Whenever a place references him, the counter will increase by one. When the reference fails, the counter will decrease by one.

There will be object circular reference problems:


objA.instance = objB
objB.instance = objA

Obja has references to objb. Objb has references to obja. They refer to each other. So they can’t recycle.

Accessibility analysis:

Take the GC roots root object as the starting point and search downward according to the reference relationship. If the object is reachable, it means that the object is alive. If the object is not reachable, it means that the object can be recycled.

The root object of GC roots is:

1) The object referenced in the local variable table in the virtual machine stack frame

2) Object referenced by static property of method area

3) Objects referenced by constants in the method area, such as references in the string constant pool

4) Object referenced by JNI in local method stack

5) Objects referenced inside the virtual machine, such as class objects corresponding to basic data types, some resident exception objects, and system class loaders

6) Object held by synchronization lock

etc.

Concurrent reachability analysis of collection threads and user threads

For concurrent reachability analysis, two exceptions may occur because the user thread will immediately modify the reference relationship of the object:

1) The object that originally died is incorrectly marked as alive. This is acceptable, resulting in floating garbage. It can be collected next time.

2) Objects that originally survived are marked as disappeared.

Trichromatic analysis diagram:

在这里插入图片描述

Black: objects that have been scanned

Gray: it has been accessed, but there is another reference that has not been scanned

White: it is accessed. If it is still white in the end, it indicates that this object needs to be recycled.

Black is wrongly marked as white under two conditions

1) The replicator inserts one or more new references from black objects to white objects

2) The replicator removes all direct or indirect references from the gray object to the white object

resolvent:

1) Incremental update, destroy condition 1, record the new references of black objects to white objects, and re scan these objects after the concurrent scanning is completed.

2) Original snapshot (SATB), crack condition 2. When the gray object wants to delete the reference relationship to the white object, record it. After the concurrency is completed, start rescanning with the record node.

Generational collection

Heap:

Cenozoic (1 / 3) old age (2 / 3)

The Cenozoic is divided into Eden / from / to

Cenozoic storage: relatively small, and the duration is relatively small

Old age: large storage time

Light GC

Full GC – > STW (stop event), fallgc special resources

Eden -> from <-> To -> old

After 15 (default) cycles of from and to, the object will be put into the old age

Garbage collection algorithm

Mark clear algorithm:

The algorithm is divided into two stages: marking and clearing. Firstly, the objects to be recycled are marked. After marking, the marked objects are recycled uniformly.

Advantages: the most basic algorithm, simple implementation

Disadvantages: 1) the execution efficiency is unstable. The more objects, the lower the efficiency

2) Memory is fragmented, and there may not be enough contiguous space when large objects need to be allocated.

For garbage collector:

Label replication algorithm (Replication Algorithm):

The memory is divided into two blocks of equal size, and only one of them is used at a time. When one block is running out, it copies the living objects to the other block.

Advantages: it can produce continuous space

Disadvantages: when the memory consumption is high and the object survival rate is high, the efficiency will be reduced (not suitable for the old age).

On behalf of garbage collector: many new generations of recycling use this algorithm.

Label sorting algorithm

First, mark the required objects. After marking, move the living objects to a section of memory, and then clear the objects outside the boundary.

Advantages: continuous memory space; System throughput (the sum of the efficiency of user threads and collectors) increases.

Disadvantages: it takes a lot of time to sort out memory, resulting in “stop the world”;

Representative garbage collector: parallel scavenge collector

In CMS, the mark clearing algorithm is mainly used, but when the memory fragmentation affects the object allocation, the mark clearing algorithm will be used once to clean up the memory fragmentation.

garbage collector

Serial collector

The most basic and oldest garbage collector is called the serial collector. When it performs garbage collection, it will stop the user thread (stop the world).

The new generation garbage collector uses serial, based on replication algorithm

In the old days, the garbage collector used serial old, based on the tag sorting algorithm.

在这里插入图片描述

Advantages: the memory consumption of all garbage collectors is the smallest, which is very efficient for single core or single threaded processors. Run on client

Disadvantages: stop the world

Parnew collector

The parnew collector is a multi-threaded parallel version of the serial collector. There is not much innovation except that it supports multi line parallel collection.

在这里插入图片描述

As an older generation garbage collector, CMS garbage collector cannot work with parallel scavenge. Only parnew or serial collectors can be selected.

Parallel scavenge collector

The new generation garbage collector, based on the tag replication algorithm, is also a multi-threaded collector that can be collected in parallel. He focuses on throughput (user thread time / total time).

在这里插入图片描述

CMS garbage collector

CMS collector is a garbage collector that aims to obtain the shortest pause time. Based on mark removal algorithm.

Steps:

Initial marking – > concurrent marking – > re marking – > concurrent clearing

在这里插入图片描述

1) Initial marking: stop the world, marking the objects that GC root can be directly associated with, which is very fast

2) Concurrent marking: traversing the whole object graph from the object directly associated with GC root takes a long time, but it is parallel to the user thread

3) Re tagging: stop the world, which corrects the change of object state during concurrent tagging (incremental update algorithm, marking new black references to white), and the time is also relatively short.

4) Concurrent Cleanup: synchronize with user threads.

Advantages: concurrent collection, low pause.

Disadvantages:

1) It is sensitive to processor resources. When the number of processor cores is less than four, CMS has a great impact on user programs.

2) There is floating garbage. New garbage will be generated during concurrent marking, but CMS will not clean it this time and will not clean it until the next time. This may result in insufficient memory and stop the world’s full GC

3) Based on the mark removal algorithm, a large number of space fragments will be generated and the full GC will be triggered

G1 garbage collector

Focus on the best balance between throughput and latency.

On the whole, the tag collation algorithm is mainly used. On the local side, it is the tag replication algorithm (replication between two regions).

G1 divides Java pairs into multiple independent regions of equal size. Each region can act as Eden space, survivor space or older generation space as needed.

G1’s division of age exists conceptually, and it can not be a continuous interval..

Region also has a special kind of humongous region, which specially stores large objects (1m-32m, configurable), and allocates objects larger than the size of region into consecutive humongous regions.

In each collection, the region is taken as the minimum unit. The G1 collector maintains a priority list in the background according to the value of the garbage team in the region, and gives priority to those regions with the greatest return on the recycling value (the empirical value of the memory size and recycling time collected each time).

G1 consumes at least 10% to 20% of the print Java heap capacity to maintain the collection.

TAMs pointer: saves the newly created object during concurrency.

Stab: the deletion of gray references to white references is recorded.

G1 steps:

1) Initial mark: stop the world, mark only the objects that GC roots can be directly associated with, and modify the TAMs pointer

2) Concurrent marking: starting from the objects directly associated with GC roots, scan the object graph (and process the objects recorded by SATB and changed from time to time). It takes a long time, but it is parallel to the user thread,

3) Final mark: stop the world for a short time to deal with a small number of SATB records left over

4) Shuai Xuan recycling: stop the world, responsible for updating the statistical data of the region and sorting the recycling value and cost of the region. Make a recycling plan according to the pause time expected by the user (JVM parameters can be configured). You can freely select multiple regions as the recycle set, then copy the surviving objects to the empty region, and then empty the regions of the recycle set.

In addition to the concurrency flag, all others use to stop user threads. The goal is not to simply pursue low latency, but to obtain the maximum throughput when the delay is controllable (user thread time / total time, the total time is user thread time + garbage collection time).

在这里插入图片描述

The above is my personal experience. I hope I can give you a reference, and I hope you can support developpaer.