JVM three color marking method and read-write barrier

Time:2022-5-5

Trichromatic marking

The main purpose of GC garbage collector is to realize memory recycling. In this process, the main two steps are: memory marking and memory recycling.

Introduction to trichromatic marking method

The three color marking method is mainly to mark the recyclable memory blocks efficiently.

JVM three color marking method and read-write barrier

image.png

Tri Color marking is used as a tool to assist in derivation. The objects encountered during traversing the object graph are marked into the following three colors according to the condition of “visited or not”:

  • White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of accessibility analysis, all objects are white. If they are still white at the end of analysis, it means that they are unreachable.
  • Black: indicates that the object has been accessed by the garbage collector, and all references of this object have been scanned. The black object represents that it has been scanned and it is safe to survive. If other object references point to the black object, there is no need to scan it again. Black objects cannot point directly (without passing through gray objects) to a white object.
  • Gray: indicates that the object has been accessed by the garbage collector, but at least one reference on the object has not been scanned.

Three color marking process

JVM three color marking method and read-write barrier

image.png

Marking process:

  1. At the beginning of GC concurrency, all objects are white;
  2. Mark all objects directly applied by GC roots as gray sets;
  3. If it is judged that there is no sub reference to the object in the gray set, it will be put into the black set. If there is a sub reference object, all its sub reference objects will be stored in the gray set, and the current object will be put into the gray set
  4. Follow this step 3, and so on, until all objects in the gray set turn black, the current round of marking is completed, and the objects in the white set are called unreachable objects, i.e. garbage objects.
  5. After marking, the objects that are white are inaccessible to GC roots and can be garbage collected.

Misbid

What is misbidding? When the following two conditions are met at the same time, misbidding will occur:

  1. The evaluator inserts one or more references from black objects to white objects
  2. The evaluator removes all direct or indirect references from gray objects to white objects

Solution to misbid

To solve the problem of mislabeling, you only need to destroy either of these two conditions. There are two solutions: incremental update and snapshot at the beginning (stab)

Incremental update

Incremental update destroys the first condition. When the black object inserts a new reference relationship pointing to the white object, record the newly inserted reference. After the concurrent scanning is completed, take the black object in the recorded reference relationship as the root and scan again. This can be simplified to mean that once a new reference to a white object is inserted into a black object, it changes back to a gray object.

Original snapshot (stab)

What the original snapshot needs to destroy is the second condition. When the gray object wants to delete the reference relationship pointing to the white object, record the reference to be deleted. After the concurrent scanning, take the gray object in the recorded reference relationship as the root and scan again. This can also be simplified to understand that whether the reference relationship is deleted or not, the search will be carried out according to the snapshot of the object graph at the moment when the scanning just started.

Missing label and multi label

In fact, there are two cases of wrong bid segmentation: missing bid and multi bid

Multi Standard floating garbage

If the mark is executed to e, it is executed nowobject.E = null

JVM three color marking method and read-write barrier

image.png

At this time, E / F / G can be recycled theoretically. But because e has becomegreyIf it does, it will continue to execute. The end result is that they will not be marked as garbage objects and will survive this round of marking.

The garbage that should be recycled in this round has not been recycled. This part is called “floating garbage”. Floating garbage does not affect the correctness of the program. These “garbage” can only be cleaned up when the next garbage collection is triggered.

In addition, new objects generated during the marking process are marked black by default, but may become “garbage” during the marking process. This is also part of floating garbage.

Missing label read-write barrier

Write barrier

When assigning a value to a member variable of an object, the underlying code is roughly as follows:

/**
 *@ param field member attribute of an object
 * @param new_ Value new value, e.g. null
 */
void oop_field_store(oop* field, oop new_value) {
    *fieild = new_ Value // assignment
}

The so-called write barrier is actually adding some processing logic (similar to AOP) before and after the assignment operation

void oop_field_store(oop* field, oop new_value) {
    pre_ write_ barrier(field); //  Write barrier - pre write barrier
    *fieild = new_ Value // assignment 
    pre_ write_ barrier(field); //  Write barrier - post write barrier
}

Write barrier + SATB

When the reference of the member variable of object e changes (obje. Fieldg = null;), We can use the write barrier toReference of original member variableRecord object G:

void pre_write_barrier(oop* field) {
    oop old_ value = *field; //  Get old value
    remark_ set. add(old_value); //  Record the original reference object
}

[whenRecord the original reference object before the reference of the original member variable changes
The idea of this approach is:Try to keep the object graph at the beginning, that is, the original snapshot (SATB), whenAt some pointAfter the GC roots of are determined,at that timeThe object graph has been determined.
such asat that timeD refers to g, and the subsequent marks should also follow the object graph at this time (d refers to g). If the period changes, it can be recorded to ensure that the mark is still in accordance with the original view.

It is worth mentioning that the operation of scanning all GC roots (i.e. initial marking) usually requires STW, otherwise it may never be completed, because new GC roots may be added during concurrency.

SATB destroys condition 1: [gray object breaks the reference of white object], so as to ensure that the label will not be missed.

A little Optimization: if it is not in the concurrent marking stage of garbage collection or has been marked, there is no need to record it, so you can add a simple judgment:

void pre_write_barrier(oop* field) {
  //Is in the GC concurrent marking phase and the object has not been marked (accessed)
  if($gc_phase == GC_CONCURRENT_MARK && !isMarkd(field)) { 
      oop old_ value = *field; //  Get old value
      remark_ set. add(old_value); //  Record the original reference object
  }
}

Incremental update + write barrier

When the reference of the member variable of object d changes (objd.fieldg = g;), We can use the write barrier to convert DNew member variable referenceRecord object G:

void post_write_barrier(oop* field, oop new_value) {  
  if($gc_phase == GC_CONCURRENT_MARK && !isMarkd(field)) {
      remark_ set. add(new_value); //  Record the newly referenced object
  }
}

When a new reference is inserted, record the new reference object
The idea of this approach is: it is not required to keep the original snapshot, butFor new references, record it and wait for traversal, that is, incremental update.

The incremental update destroys condition 2: [the black object re references the white object], so as to ensure that the label will not be missed.

Load barrier

oop oop_field_load(oop* field) {
    pre_ load_ barrier(field); //  Read barrier - operation before reading
    return *field;
}

The read barrier is directed to the first stepvar objF = object.fieldG;,

void pre_load_barrier(oop* field, oop old_value) {  
  if($gc_phase == GC_CONCURRENT_MARK && !isMarkd(field)) {
      oop old_value = *field;
      remark_ set. add(old_value); //  Record read objects
  }
}

This approach is conservative, but it is also safe. Because in condition 2 [the black object re references the white object], the premise of re referencing is: get the white object, and the read barrier will work at this time.

Three color marking method and garbage collector

Incremental update: CMS

Original snapshot (stab): G1,Shenandoah