The creation of a large number of class loaders leads to weird fullgc

Time:2021-9-16

phenomenon

Recently, I took over a colleague’s project. One day, the response time of the interface suddenly increased a lot, from tens of MS to tens of seconds.

First, check the log on the machine. There is a timeout when calling the third-party interface and querying the database. Check the third-party interface monitoring and database monitoring immediately. Everything is normal. The statistics timeout may be caused by GC pause. At this time, we can check the GC through jstat – gcutil PID. The data are as follows:

S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00   3.88  12.86  76.39  45.62    211    8.574   892  626.192  634.767
  0.00   0.00   4.10  12.86  76.39  45.62    211    8.574   893  626.192  634.767
  0.00   0.00   0.00  12.88  76.39  45.62    211    8.574   894  626.915  635.489
  0.00   0.00   0.11  12.88  76.39  45.62    211    8.574   896  627.678  636.253
  0.00   0.00   0.00  12.87  76.39  45.62    211    8.574   897  628.926  637.500
  0.00   0.00   0.00  12.87  76.39  45.62    211    8.574   899  630.381  638.956
  0.00   0.00   1.92  12.87  76.39  45.62    211    8.574   901  631.155  639.729
  0.00   0.00   0.00  12.87  76.39  45.62    211    8.574   902  632.379  640.954
  0.00   0.00   2.14  12.87  76.39  45.62    211    8.574   903  633.094  641.668
  0.00   0.00   0.00  12.88  76.39  45.62    211    8.574   904  633.859  642.433
Copy code

Here we can see that the utilization rate of the younger generation (E) is very small, the utilization rate of the older generation (o) is not much, the utilization rate of M (Metaspace) is 76.39%, and the Yong GC has not changed. The full GC has been going on all the time, taking more than 800 ms each time. In combination with the previous e, O and M utilization rates, it shows that the memory can not be recycled all the time.

The JVM memory size is configured as follows:

-Xms3g -Xmx3g -Xmn1g -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m
Copy code

Next, let’s look at the GC log:

2020-08-13T23:11:00.352+0800: 214929.371: [GC (CMS Initial Mark)  276144K(3040896K), 0.0405942 secs]
2020-08-13T23:11:00.886+0800: 214929.905: [Full GC (Metadata GC Threshold)  290482K->275966K(3040896K), 0.7939954 secs]
2020-08-13T23:11:01.693+0800: 214930.712: [Full GC (Last ditch collection)  275966K->275964K(3040896K), 0.8086755 secs]
2020-08-13T23:11:02.520+0800: 214931.539: [Full GC (Metadata GC Threshold)  295199K->273816K(3040896K), 0.8332017 secs]
2020-08-13T23:11:03.366+0800: 214932.385: [Full GC (Last ditch collection)  273816K->273799K(3040896K), 0.7748226 secs]
Copy code

There is metadata GC threshold in the GC log. Combined with the highest utilization of the previous Metaspace, we guess that the Metaspace may overflow, and then we use the grep OUTOFMEMORY keyword in the log. The following errors are reported:

java.lang.OutOfMemoryError: Metaspace
Copy code

So far, it can be confirmed that there is a problem with Metaspace, but why is the utilization rate of jstat output only 76.39%? If you often use jstat to look at normal programs, you will find that in many normal cases, Metaspace occupies more than 90%.

Metaspace

Metaspace metadata space is specially used to store metadata of classes. It is a special data structure used to replace perm in jdk8.

The Metaspace space is allocated in local memory (not on the heap). By default, memory usage is not limited. You can use maxmetaspace size to specify the maximum value. Metaspacesize specifies the minimum value, which is 21 m by default. Apply for memory from the operating system through MMAP. The applied memory will be divided into one metachunk. The memory will be allocated to the class loader in the unit of metachunk. Each metachunk corresponds to a unique class loader. A class loader can have multiple metachunks.

You can use Java – XX: + printflagsfinal – version to view the default parameter values of the JVM

In the Java virtual machine, each class loader has a data structure of classloaderdata. There is a Metaspace for memory management inside classloaderdata. The Metaspace will call get when initializing_ initialization_ The chunk allocates the first metachunk. The class loader uses the metachunk in the unit of metablock.

//classLoaderData.hpp
class ClassLoaderData : public CHeapObj {
...

  Metaspace * _metaspace;  // Meta-space where meta-data defined by the
                           // classes in the class loader are allocated.
  Mutex* _metaspace_lock;  // Locks the metaspace for allocations and setup.

...
}

// metaspace.hpp
class Metaspace : public CHeapObj {
...
 private:
  void initialize(Mutex* lock, MetaspaceType type);

  Metachunk* get_initialization_chunk(MetadataType mdtype,
                                      size_t chunk_word_size,
                                      size_t chunk_bunch);
...
}

// metachunk.hpp
class Metachunk : public Metabase
class Metablock : public Metabase

//Parent classes of metablock and metachunk
template 
class Metabase VALUE_OBJ_CLASS_SPEC {
  size_t _word_size;
  T*     _next;
  T*     _prev;
...
}
Copy code

The following figure shows the memory structure allocated by each class loader.

image.png

Next, let’s talk about when fullgc will be triggered. There is a parameter minmetaspacefreeratio (default 40). GC will be performed when the following conditions are met. If the memory to be applied is larger than the remaining space that can be committed, and if maxmetaspacesize is not reached, capacity expansion will be triggered.

The remaining space that can be committed < (committed size * minmetaspacefreeratio)

When it comes to committed memory, there are several concepts: used, capacity and reserved, as shown in the following figure

  • used: block memory used in chunk, in which class data is loaded.

  • capacity: chunk memory in use.

  • commited: all allocated chunk memory, including free memory that can be used again.

  • reserved: is the amount of memory that can be used.

image.png

As shown below, it is the printed memory information. The last line is the information of kclass specially stored in Metaspace after the compression pointer is turned on (64 bit compression is 32 bit).

Heap
 par new generation   total 30720K, used 1519K [0x00000007f8600000, 0x00000007fa750000, 0x00000007fa750000)
  eden space 27328K,   5% used [0x00000007f8600000, 0x00000007f877bcc8, 0x00000007fa0b0000)
  from space 3392K,   0% used [0x00000007fa400000, 0x00000007fa400000, 0x00000007fa750000)
  to   space 3392K,   0% used [0x00000007fa0b0000, 0x00000007fa0b0000, 0x00000007fa400000)
 concurrent mark-sweep generation total 68288K, used 21614K [0x00000007fa750000, 0x00000007fea00000, 0x00000007fea00000)
 Metaspace       used 23505K, capacity 30704K, committed 30720K, reserved 1073152K
  class space    used 3341K, capacity 7550K, committed 7552K, reserved 1048576K
Copy code

Now that the basics are finished, let’s go back to the beginning. How do we calculate the M printed by jstat? The utilization rate here is not the utilization rate of the entire Metaspace memory as we understand it.

M = used / commited
Copy code

Therefore, the Metaspace memory overflows and the utilization rate is only 76%. There are two possibilities:

  1. The allocated memory has reached more than 61m (256M * 24%)?

  2. Low chunk usage assigned to class loaders?

The first is obviously unlikely. A class can’t need so much memory. In the second case, when many class loaders are created, each class loader loads few classes.

As mentioned above, when the remaining free memory is less than the threshold of metaspacegc, fullgc will be executed. However, we said at the beginning that in some normal scenarios, the utilization rate of jstat printing has reached more than 90%, and fullgc has not been triggered. Why? Welcome to leave a message and share your answers

Troubleshooting procedure

First, let’s take a look at what classes are loaded by Metaspace

jcmd pid GC.class_stats |awk '{print $13}'| sort | uniq -c |sort -r| head
Copy code

Check the loaded classes through jcmd, and then count the number. We can see that script1 has been loaded more than 20000 times. According to the parental delegation method of JVM class loading, a class can be loaded at most once. There are many times here, which may be loaded by different class loaders.

27348 Script1
   3
   2 ClassName
   1 sun.util.spi.CalendarProvider
   1 sun.util.resources.en.TimeZoneNames_en
   1 sun.util.resources.en.CurrencyNames_en_US
   1 sun.util.resources.en.CalendarData_en
   1 sun.util.resources.TimeZoneNamesBundle
   1 sun.util.resources.TimeZoneNames
   1 sun.util.resources.ParallelListResourceBundle$KeySet
Copy code

Through jcmd, you need to add the parameter: – XX: + unlockdiagnosticvmoptions before startup

Then let’s look at the data of the JVM class loader

jmap -clstats pid

Here, classes is the number of classes loaded. From the output, you can see that there are a large number of groovyclassloader class loaders.

class_loader    classes bytes   parent_loader   alive?  type
     2850    4913169   null          live    
0x000000077bc27bc0      1       1394    0x000000077bc64418      dead    groovy/lang/[email protected]
0x000000077d9e7d98      0       0       0x0000000770800000      dead    groovy/lang/[email protected]
0x00000007805e8050      0       0       0x0000000770800000      dead    groovy/lang/[email protected]
0x000000077df07de0      0       0       0x0000000770800000      dead    groovy/lang/[email protected]
0x0000000780028010      1       1394    0x000000078005a6c8      dead    groovy/lang/[email protected]
0x0000000776467650      1       1394    0x000000077646b190      dead    groovy/lang/[email protected]
0x000000077a167a00      1       1394    0x000000077a16b380      dead    groovy/lang/[email protected]
Copy code

According to statistics, each groovyclassloader $innerloader loads only one class, and the total number of classes is 27348, which is just right with the number of script1 classes above, indicating that this class loader loads it.

Next, how to locate the classes loaded by the class loader?

First look at where groovy was introduced, and then debug locally. Add the JVM parameter: – XX: + unlockdiagnosticvmoptions. When loading the class, the console will print, and you can locate where to load it step by step.

In our project, sharding is used as a sub table. The groovy version introduced by sharding is as follows

io.shardingsphere
    sharding-jdbc-spring-boot-starter
    3.0.0.M1


  org.codehaus.groovy
  groovy
  indy
  2.4.5

Copy code

Finally, the code to locate the problem is as follows. After you configure the expression of the sub table, each time you execute the query operation, you will create a groovyshell to execute the configured expression. In groovyshell, a class loader is generated every time to load class script1. After loading, it cannot be dropped by GC, resulting in memory leakage.

public InlineShardingStrategy(final InlineShardingStrategyConfiguration inlineShardingStrategyConfig) {
    Preconditions.checkNotNull(inlineShardingStrategyConfig.getShardingColumn(), "Sharding column cannot be null.");
    Preconditions.checkNotNull(inlineShardingStrategyConfig.getAlgorithmExpression(), "Sharding algorithm expression cannot be null.");
    shardingColumn = inlineShardingStrategyConfig.getShardingColumn();
    String algorithmExpression = InlineExpressionParser.handlePlaceHolder(inlineShardingStrategyConfig.getAlgorithmExpression().trim());
    closure = (Closure) new GroovyShell().evaluate(Joiner.on("").join("{it -> \"", algorithmExpression, "\"}"));
}
Copy code

Here, you can upgrade the new version of sharding. In the new version, groovyshell is static.

public final class InlineExpressionParser {
...
    private static final GroovyShell SHELL = new GroovyShell();
...
}
Copy code

There is another question here. The class loader is used up and the state is dead. Why not recycle it?

Local replication

The reproduced code is very simple. The above groovy version is introduced and the JVM parameters are added at runtime

// -Xmx100M -Xms100M -verbose:class -XX:+PrintGCDetails -XX:MaxMetaspaceSize=30M -XX:MetaspaceSize=30M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UnlockDiagnosticVMOptions -XX:+HeapDumpOnOutOfMemoryError
public static void main(String[] args) {
    for (int i = 0; i < 4000; i++) {
        new GroovyShell().parse("");
    }
}
Copy code

Next, I’ll mainly talk about how to use mat to check why this class load is not recycled. Load the heap from the sample program dump with mat, select histogram, and then enter groovycloader in the regular. Objects indicates the number of objects created. Here there are 3255, indicating that the Metaspace overflows after the above for loop is executed 3255 times.

image.png

Next, select domino tree, enter script1 regular filter, right-click and select path to GC roots. Here we only care about strong references, so execlude other types of references.

image.png

If the class loader is recycled, the class it loads will also be recycled. If the class is referenced, it must not be recycled. Therefore, let’s start with the object of script1. As shown in the figure below, script1 class is referenced and finally reaches GC root (appclassloader), so full GC cannot be recycled.

image.png

After reading three things ❤ ️

If you think this article is very helpful to you, I’d like to invite you to help me with three small things:

  1. Praise, forwarding, and your “praise and comments” are the driving force for my creation.

  2. Pay attention to the official account.Rotten pig skin“And share original knowledge from time to time.

  3. At the same time, we can look forward to the follow-up articles

  4. Welcome to the authorgitee, hope to learn together

Recommended Today

Detailed steps for installing Perl and Komodo IDE for windows

Perl official website: https://www.perl.org/Perl document: https://perldoc.perl.org/Download address: https://www.perl.org/get.html The installation package of Perl Windows version is divided into activestate Perl and strawberry Perl. For the difference between the two, see: http://www.zzvips.com/article/202134.htm Note: the download speed of activestate Perl is slow. You may need KX to surf the Internet I have uploaded all the versions of […]