☕[ JVM technical guide] (2) common garbage collection algorithms of garbage collection subsystem

Time:2021-8-31

☕[ JVM technical guide] (2) common garbage collection algorithms of garbage collection subsystem

Common garbage collection algorithms

GC Roots

In the Java language, GC roots includes the following types of elements:

  1. Objects referenced in the virtual machine stack, such as parameters and local variables used in the methods called by each thread

  2. The object referenced by JNI (commonly referred to as local method) in the local method stack

  3. The object referenced by the class static attribute in the method area, such as the reference type static variable of the Java class

  4. Objects referenced by constants in the method area, such as references in the string constant pool (string table)

  5. All objects held by synchronized locks

  6. Internal reference of Java virtual machine, class object corresponding to basic data type, some resident exception objects (such as NullPointerException, outofmemoryerror), system class loader.

  7. Jmxbean reflecting the internal situation of Java virtual machine, callback registered in JVMTI, local code cache, etc

Generational collection theory

Most of the garbage collectors of contemporary commercial virtual machines are designed according to the theory of “generational collection”, which is based on two hypotheses:

  • Weak generational hypothesis: most objects live and die day and night.

  • Strong generational hypothesis: objects that survive multiple garbage collection processes are more difficult to die.

  • (less) cross generational citation hypothesis: cross generational citation is only a small number compared with the same generation citation

These two generational hypotheses jointly establish the consistent design principle of common garbage collectors: the collector should divide the Java heap into different areas, and then allocate the recycled objects to different areas for storage according to their age (the number of times the objects have survived the garbage collection process).

  • If most objects in an area live and die day and night, it is difficult to survive the garbage collector collection process. Put them together. Each collection only focuses on how to keep a small number of objects alive instead of marking a large number of objects to be recycled, so that a large amount of space can be recycled at the least cost;

  • If the remaining objects are hard to die, put them together and use a lower frequency to reclaim the area; This takes into account the time overhead of garbage collection and the effective utilization of memory space.

After the Java heap is divided into different areas, the garbage collector can recycle some of the areas each time – therefore, with the division of recycling types such as “minor GC”, “major GC” and “full GC” – the garbage collection algorithm can match the survival characteristics of the stored objects in these different areas. Therefore, Developed “mark copy algorithm”, “mark clear algorithm”, “mark tidy algorithm” and other targeted garbage collection algorithms.

It began with generational collection theory.

Generational collection is not as easy as simply dividing the memory area. It has at least one obvious dilemma: objects are not isolated, and there will be differences between objectsCross generational reference, two objects that have a mutual reference relationship should tend to survive or die at the same time.

According to this hypothesis,We should no longer scan the entire older generation for a small number of cross generational references, nor waste space to specifically record whether each object has cross generational references. We just need to establish a complete data structure in the new generation (the structure is “memory set”).

  • Identify which memory of the older generation has cross generational references. When minor GC occurs, only the objects in the small memory containing cross generational references will be added to the GC roots for scanning.

Generation collection name definition:

  • Partial GC: refers to the garbage collection that aims to partially collect the whole Java heap, which is divided into:
    • Minor GC / young GC: just the garbage collection of the new generation.
    • Major GC / old GC: just garbage collection in the old age. At present, only the CMS collector can collect the old age separately (you can set the cmsscavenge option to enable the minorgc mechanism).
    • Mixed GC: collect the garbage of the whole Cenozoic and some old ages. Currently, only G1 collectors have this behavior.
  • Full GC: collect garbage collection for the entire Java heap and method area.

Garbage collection algorithm

Three common garbage collection algorithms in the JVM are the mark sweep algorithm, the copying algorithm, and the mark compact algorithm

Mark – clear algorithm

Algorithm background

Mark sweep algorithm is one of the most basic and common garbage collection algorithms. It was proposed by J. McCarthy and others in 1960 and applied to LISP language. Most of the subsequent collection algorithms are based on mark sweep algorithm and improve its shortcomings.

Basic ideas

The basic idea of the algorithm: mark all the objects that need to be recycled. After marking, uniformly recycle all the marked objects.

Algorithm structure

The algorithm is divided into two stages: marking and clearing

  • Marking stage: all objects will be scanned from the root node. If an object is found to be referenced, it will be recorded as a reachable object in the object header (object ID 11);

  • Clear phase: traverse the heap memory linearly from beginning to end. If it is found that the header of the object is not marked as reachable, it will be recycled. However, it should be noted that [when executing these two phases, you need to stop the whole program, also known as stop the world, and then carry out these two tasks].

General overview

When the available memory in the heap is exhausted, the whole program (also known as stop the world) will be stopped, and then two tasks will be carried out. The first is marking (marking non garbage objects, also known as reachable objects), and the second is clearing. After successfully distinguishing the living objects and dead objects in the memory, The next task of GC is to perform garbage collection and free up the memory space occupied by useless objects so that there is enough available memory space to allocate memory for new objects.

☕[ JVM technical guide] (2) common garbage collection algorithms of garbage collection subsystem

Main disadvantages

  1. The execution efficiency is unstable. When most of the data needs to be recycled, a large number of marking and clearing actions are required.

  2. Memory space fragmentation. After marking and clearing, a large number of discontinuous memory fragments will be generated

  3. The free memory cleared in this way is discontinuous, resulting in memory fragments. A free list needs to be maintained

Note: what is clear?

Marking – replication algorithm [new generation]

Algorithm background

In order to solve the defect of mark clear algorithm in garbage collection efficiency, M.L. Minsky published a famous paper, “CA LISP garbage collector algorithm using serial secondary storage using LISP language garbage collector with dual storage areas” in 1963. M. The algorithm described by L. Minsky in this paper is called copying algorithm, which is also successfully introduced into an implementation version of LISP language by M. L. Minsky himself.

Core idea

Divide the living memory space into two blocks, use only one of them each time, copy the living objects in the memory in use to the unused memory block during garbage collection, then clear all objects in the memory block in use, exchange the roles of the two memories, and finally complete garbage collection,

☕[ JVM technical guide] (2) common garbage collection algorithms of garbage collection subsystem

The Cenozoic is divided into a larger Eden space and two smaller survivor spaces. Only Eden and one survivor are used for each memory allocation. In case of garbage search, the surviving objects in Eden and survivor are copied to another survivor at one time, and then Eden and the used survivor space are directly cleared.

The default size ratio of Eden to survivor of hotspot virtual machine is 8:1, that is, 10% of the new generation will be wasted. When the survivor space is insufficient to accommodate objects that survive a minor GC, it is necessary to allocate other memory areas.

Principle summary

Because the efficiency of mark clear algorithm is relatively low, in order to solve this problem, replication algorithm appears. Double the memory space. Then one of the memory spaces is empty and there are no objects stored in it. Each time garbage collection, scan all objects in the non empty memory space. If the scanned objects are referenced, copy them to the empty memory space, and finally recycle the original non empty memory space as a whole, In this way, although the time efficiency is high, the space efficiency is relatively low, because double space is used, which is a typical idea of exchanging space for time.

The so-called clear here is not really empty, but saves the address of the object to be cleared in the free address list. The next time a new object needs to be loaded, judge whether the garbage location space is enough. If so, store it.

advantage:

  1. Simple implementation and efficient operation.

  2. After copying, ensure the continuity of space and avoid the problem of “fragmentation”.

Disadvantages:

  1. The disadvantage of this algorithm is also obvious, that is, it needs twice the memory space and wastes the related memory

  2. For the G1 GC, which is split into a large number of regions, replication rather than movement means that the GC needs to maintain the object reference relationship between regions, regardless of memory occupation or time overhead.

Marking – sorting algorithm [old age]

☕[ JVM technical guide] (2) common garbage collection algorithms of garbage collection subsystem

background

  • The efficiency of tag copy algorithm is based on the premise of less surviving objects and more garbage objects. This often happens in the new generation, but in the old generation, it is more common that most objects are living objects. If the replication algorithm is still used, the cost of replication will be high due to the large number of surviving objects. Therefore, based on the characteristics of older generation garbage collection, other algorithms need to be used.

  • The mark clear algorithm can indeed be applied to the older generation, but it is not only inefficient, but also generates memory fragments after memory recycling. Therefore, the designer of the JVM needs to improve on this basis. The mark compact algorithm was born.

Like the mark clean algorithm, the mark clean algorithm recursively marks all reachable objects from the root node, moves all living objects to one end, and then directly cleans up the memory outside the end boundary.

Execution process:

  1. The first stage is the same as the mark clear algorithm, marking all referenced objects from the root node.

  2. In the second stage, all living objects are compressed to one end of memory (divided by pointer and cursor) and arranged in order.

  3. After that, clean up all the space outside the convenience.

advantage:

  1. The disadvantage of scattered memory areas in the mark clear algorithm is clear. When we need to allocate memory to new objects, the JVM only needs to hold a starting address of memory.

    • As you can see, the marked living objects will be sorted out and arranged at one time according to the memory address, while the unmarked memory will be cleaned up. In this way, when we need to allocate memory for new objects, the JVM only needs to hold a real address of memory, which obviously reduces a lot of overhead than maintaining a free list.
  2. It eliminates the high cost of memory halving in the replication algorithm

  3. No memory fragmentation

Disadvantages:

  1. In terms of efficiency, the tag collation algorithm is lower than the replication algorithm

  2. When moving an object, if the object is referenced by other objects, you also need to adjust the address of the reference

  3. During the move, the user application needs to be suspended all the way. That is, STW (stop the world) needs to stop the whole program when executing the algorithm

Core principles

After successfully distinguishing the living object from the dead object in memory, the next task of GC is to perform garbage collection to free the memory space occupied by the Wu used object, so that there is enough available memory space to allocate memory for the new object.

At present, three common garbage collection algorithms in JVM are mark sweep algorithm, copying algorithm and mark compact algorithm.

The implementation process of this algorithm is divided into three stages. In the first stage, the marking stage is the same as the replication algorithm, marking all referenced objects from the root node; Then the second stage is to move all the referenced objects to one end of the memory space and arrange them in order; The third stage is to clean up all unreferenced garbage objects at the other end of the memory space.

Difference from Mark clearing

The essential difference between the two is that the mark purge algorithm is a non mobile recycling algorithm, and the mark compression algorithm is mobile. Whether to move the living object after recycling is a risk decision with both advantages and disadvantages.

Generation algorithm

Memory is divided into several blocks according to different object life cycles. Generally, the Java heap is divided into the new generation and the old generation, and the most appropriate collection algorithm is adopted according to the characteristics of each age.

  • In the new generation, it is found that a large number of objects die and only a small number survive each garbage collection, so the replication algorithm is selected, and the collection can be completed by paying the replication cost of a small number of surviving objects.

  • In the elderly generation, because the object has a high survival rate and there is no additional space to allocate and guarantee it, we must use the “mark clean” or “mark tidy” algorithm to recycle it

Partition algorithm

The generation algorithm will be divided into two parts according to the length of the object life cycle, while the partition algorithm divides the heap into different cells. Each interval is used independently and recycled independently. The advantages of this algorithm control the number of cells recycled at one time.

With the increasing computing power of computers, memory becomes cheaper and cheaper, and more and more heap space is available for application programs in production. Generally speaking, the larger the heap space is, the longer the GC recovery time is. In order to better control the GC time, the large area is divided into independent small rooms, manage independent recycling independently, and reasonably recycle several cells each time, It can reduce the pause time caused by GC.

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]