A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

Time:2022-11-24

1 Introduction

The origin is an OOM in the online environment of a certain customer. It is said that this kind of weird OOM has happened for the second time. First describe the relevant information of OOM:

  1. Heap memory 13G;
  2. Use the G1 garbage collector;
  3. The automatically generated dump file after OOM is 7G;
  4. The general function of the application is to consume kafka data, after some business processing and then send back to kafka [a single message is said to be about 32M]

Customer doubts:

  1. The heap memory is 13G, and the dump file is 7G. Obviously, the memory is enough, so why is it OOM?

Due to online environment problems, it is not possible to restart and debug at will. Only one memory dump file can be analyzed.

2. Online dump file analysis

2.1 Problem 1 – Kafka’s RecordAccumulator takes up a lot of memory

There are no screenshots here (forgot), brain fill. It was once suspected that OOM was caused by kafka’s local cache. generallyRecordAccumulatorandKafkaProduceris a one-to-one correspondence【KafkaProducerCreated in the construction line number each time it is initializedRecordAccumulatorExamples]. But the customer said that KafkaProducer exists in ThreadLocal and will not be created without limit, that isRecordAccumulatorNor will there be unlimited creation.

Keep the possibility of the problem here first (after allRecordAccumulatorThe number of instances is nearly 140, which is not normal).

After a rough searchDefaultKafkaProducerFactory, to see if there is any mystery in this factory class. Really found something: the general situation isIn the callback method of sending a message, if an exception is thrown ❌ when sending a message, the related KafkaProducer instance will be closed (physically closed), and will be recreated when needed later.

This can explain why KafkaProducer does not match the number of threads. butRecordAccumulatorThere is still no way to explain the problem of too many instances, and it may be necessary to dig into the source code.

aboutDefaultKafkaProducerFactoryYou can refer to this blog:https://blog.csdn.net/Pacoson…

2.2 Problem 1 – OOM information in the thread stack

It is found that the business thread stack of kafka producer has OOM content:
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

2.3 Problem 2 – There is information about kafka message sending failure in the thread stack

  • RecordTooLargeExeception error occurs in kafka producer callback function

A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

  • Kafka producer starts recording error logs

A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

2.4 Problem 3 – RecordTooLargeExeception

A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

After analyzing the memory dump, I gave myself a bold guess. Could it be that the actual heap memory is not enough, but some other condition is triggered, causing the display to throwOutOfMemoryErrorof error.

With this doubt, I still focused on the thread stack, and after a closer look, I suddenly realized something. In the thread stack is the OOM error thrown during the process of printing the error log. Moreover, string splicing is performed during the process of building log messages, and thenStringBuilderThe array copy caused by expansion is OOM. At this point, I have turned my attention toStringBuildersuperior.

3. Take a look at StringBuilder expansion

With all the conjectures above, I read it carefullyStringBuildermiddleappend()The source code and expansion mechanism.

existStringBuilder, there is a character array called value [declared in the parent class AbstractStringBuilder], which is used to store the string content (also in the String class).
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

StringBuilder.append()method will eventually call the parent class’sjava.lang.AbstractStringBuilder#append(java.lang.String)method, the source code is as follows:

public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        // Get the length of the string to append
        int len = str.length();
        // Check the capacity of the character array in StringBuilder to ensure that it can be appended successfully
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }

next lookensureCapacityInternal()method:
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

It can be seen that if the length of the new string is greater than the length of the original string, an array copy will be performed. And note the note above: if the value overflows, it will throwOutOfMemoryError. I feel that my guess is about to be confirmed, excited.

look carefullynewCapacity()method,

/**
     * The maximum size of array to allocate (unless necessary).
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    /**
     * Returns a capacity at least as large as the given minimum capacity.
     * Returns the current capacity increased by the same amount + 2 if
     * that suffices.
     * Will not return a capacity greater than {@code MAX_ARRAY_SIZE}
     * unless the given minimum capacity is greater than that.
     *
     * @param  minCapacity the desired minimum capacity
     * @throws OutOfMemoryError if minCapacity is less than zero or
     *         greater than Integer.MAX_VALUE
     */
    private int newCapacity(int minCapacity) {
        // overflow-conscious code
        // The length of the new array is twice the length of the original array➕2
        int newCapacity = (value.length << 1) + 2;
        // If the minimum array length required to append new characters is greater than newCapacity, assign the value of minCapacity to newCapacity
        // This situation only occurs when the length of the appended string is more than twice the length of the original string
        if (newCapacity - minCapacity < 0) {
            newCapacity = minCapacity;
        }
        // If newCapacity is greater than Integer.MAX_VALUE - 8, follow the logic of hugeCapacity
        // Otherwise, return directly to newCapacity
        return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)
            ? hugeCapacity(minCapacity)
            : newCapacity;
    }

MAX_ARRAY_SIZEThe value is Integer.MAX_VALUE – 8. And the comment gives a hint: this is the maximum length of the array (unless necessary), allocating a large array may cause OOM.
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

Look againhugeCapacity()method,

private int hugeCapacity(int minCapacity) {
        // If the minimum array length required to append new characters is greater than Integer.MAX_VALUE, direct OOM
        if (Integer.MAX_VALUE - minCapacity < 0) { // overflow
            throw new OutOfMemoryError();
        }
        /**
        * If the minimum array length required to append a new character is greater than Integer.MAX_VALUE - 8,
        * The length of the new array is minCapacity, otherwise the length of the array is the value of MAX_ARRAY_SIZE
        * The latest expansion when minCapacity is greater than half of Integer.MAX_VALUE,
        * The new array length will directly use MAX_ARRAY_SIZE
        **/
        return (minCapacity > MAX_ARRAY_SIZE)
            ? minCapacity : MAX_ARRAY_SIZE;
    }
  • The key point is summarized in one sentence:String length greater thanInteger.MAX_VALUEIt will directly OOM

Here comes the problem again:

  1. 32M messages, no matter how the capacity is expanded, it will not exceedInteger.MAX_VALUEthe value of it;
  2. Look at the demo below, this Nima andInteger.MAX_VALUEThe difference is too much;
    A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

Don’t panic, let’s continue (hand wheelchair on the road).

Let me talk about the conclusion first, this is actually related toThe current available heap memory is related

4. Array object allocation

  • Put a set of data first:Integer.MAX_VALUEA character array of this size needs to occupy about 4G heap memory (2147483646*2/1024/1024/1024).

First look at the two experiments I did locally, and review the GC log by the way.

  1. Startup parameters (GC is not configured, jdk8 defaults to PS garbage collection): -Xmx8g -Xms8g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log

A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

✨GC logs are as follows:

Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(1613060k free)
/proc/meminfo:
CommandLine flags: -XX:InitialHeapSize=8589934592 -XX:MaxHeapSize=8589934592 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
PSYoungGen      total 2446848K, used 1436457K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 68% used [0x0000000715580000,0x000000076d04a5f8,0x0000000795600000)
from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
ParOldGen       total 5592576K, used 3538944K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 63% used [0x00000005c0000000,0x0000000698000020,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.302: [GC (Allocation Failure) [PSYoungGen: 1436457K->1444K(2446848K)] 4975401K->3540396K(8039424K), 0.0133225 secs] [Times: user=0.16 sys=0.00, real=0.01 secs] 
Heap after GC invocations=1 (full 0):
PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=2 (full 0):
PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.316: [GC (Allocation Failure) [PSYoungGen: 1444K->1491K(2446848K)] 3540396K->3540451K(8039424K), 0.0185590 secs] [Times: user=0.23 sys=0.00, real=0.02 secs] 
Heap after GC invocations=2 (full 0):
PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=3 (full 1):
PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.334: [Full GC (Allocation Failure) [PSYoungGen: 1491K->0K(2446848K)] [ParOldGen: 3538960K->2360557K(5592576K)] 3540451K->2360557K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.2781291 secs] [Times: user=2.89 sys=0.16, real=0.28 secs] 
Heap after GC invocations=3 (full 1):
PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=4 (full 1):
PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.613: [GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] 2360557K->2360557K(8039424K), 0.0177683 secs] [Times: user=0.22 sys=0.00, real=0.01 secs] 
Heap after GC invocations=4 (full 1):
PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=5 (full 2):
PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.630: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] [ParOldGen: 2360557K->2360504K(5592576K)] 2360557K->2360504K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.0095262 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
Heap after GC invocations=5 (full 2):
PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
Heap
PSYoungGen      total 2446848K, used 52441K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 2097664K, 2% used [0x0000000715580000,0x00000007188b67e8,0x0000000795600000)
from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
Metaspace       used 3926K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 430K, capacity 460K, committed 512K, reserved 1048576K

It can be seen that after the final Full GC, the character array was moved to the old generation. At this time, the old generation occupies 2360504K, which is exactly the size of the character array with a length of 1207959551. At this time, if you want to expand the capacity again, the length of the new array isInteger.MAX_VALUE - 8, that is, it is necessary to re-allocate an array of nearly 4G in the heap memory. However, neither the young generation nor the old generation can accommodate the 4G size, so it is OOM.

OOM information:OutOfMemoryError: Java heap space, which is different from the OOM of the 16g heap memory below, which means that there is insufficient memory when the object actually allocates memory.
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

  1. Startup parameters (GC is not configured, jdk8 defaults to PS garbage collection): -Xmx16g -Xms16g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log
    【eden:5588992K ParOldGen:11185152K】

A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

✨GC logs are as follows:

Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(942232k free)
/proc/meminfo:
CommandLine flags: -XX:InitialHeapSize=17179869184 -XX:MaxHeapSize=17179869184 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
PSYoungGen      total 4893184K, used 2788885K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 66% used [0x000000066ab00000,0x0000000714e85418,0x000000076ab00000)
from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
ParOldGen       total 11185152K, used 10747904K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 96% used [0x00000003c0000000,0x0000000650000010,0x000000066ab00000)
Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.887: [GC (Allocation Failure) [PSYoungGen: 2788885K->1488K(4893184K)] 13536789K->10749400K(16078336K), 0.0556800 secs] [Times: user=0.39 sys=0.21, real=0.05 secs] 
Heap after GC invocations=1 (full 0):
PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=2 (full 1):
PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.943: [Full GC (Ergonomics) [PSYoungGen: 1488K->0K(4893184K)] [ParOldGen: 10747912K->4195565K(11185152K)] 10749400K->4195565K(16078336K), [Metaspace: 3914K->3914K(1056768K)], 0.9751963 secs] [Times: user=2.33 sys=4.18, real=0.98 secs] 
Heap after GC invocations=2 (full 1):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 4195565K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 37% used [0x00000003c0000000,0x00000004c013b720,0x000000066ab00000)
Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=3 (full 1):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.945: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0296937 secs] [Times: user=0.38 sys=0.00, real=0.03 secs] 
Heap after GC invocations=3 (full 1):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=4 (full 1):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.975: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0293211 secs] [Times: user=0.37 sys=0.00, real=0.04 secs] 
Heap after GC invocations=4 (full 1):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=5 (full 2):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.019: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389869K->4195332K(11185152K)] 8389869K->4195332K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4611617 secs] [Times: user=5.40 sys=0.04, real=0.46 secs] 
Heap after GC invocations=5 (full 2):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 4195332K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0101090,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=6 (full 2):
PSYoungGen      total 4893184K, used 83886K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 2% used [0x000000066ab00000,0x000000066fceb868,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.963: [GC (Allocation Failure) [PSYoungGen: 83886K->0K(4893184K)] 8473522K->8389636K(16078336K), 0.0356100 secs] [Times: user=0.46 sys=0.00, real=0.04 secs] 
Heap after GC invocations=6 (full 2):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=7 (full 2):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.999: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389636K->8389636K(16078336K), 0.0234444 secs] [Times: user=0.30 sys=0.00, real=0.02 secs] 
Heap after GC invocations=7 (full 2):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=8 (full 3):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.023: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389636K->4195323K(11185152K)] 8389636K->4195323K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4557171 secs] [Times: user=5.56 sys=0.04, real=0.46 secs] 
Heap after GC invocations=8 (full 3):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
ParOldGen       total 11185152K, used 4195323K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 37% used [0x00000003c0000000,0x00000004c00fedf8,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=9 (full 3):
PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.976: [GC (Allocation Failure) [PSYoungGen: 0K->0K(5590528K)] 8389627K->8389627K(16775680K), 0.0401370 secs] [Times: user=0.52 sys=0.00, real=0.04 secs] 
Heap after GC invocations=9 (full 3):
PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=10 (full 3):
PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.776: [GC (Allocation Failure) --[PSYoungGen: 4194304K->4194304K(5590528K)] 12583931K->12583931K(16775680K), 0.0263122 secs] [Times: user=0.32 sys=0.00, real=0.02 secs] 
Heap after GC invocations=10 (full 3):
PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
{Heap before GC invocations=11 (full 4):
PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.802: [Full GC (Ergonomics) [PSYoungGen: 4194304K->0K(5590528K)] [ParOldGen: 8389627K->4195328K(11185152K)] 12583931K->4195328K(16775680K), [Metaspace: 3916K->3916K(1056768K)], 0.8243356 secs] [Times: user=1.74 sys=1.38, real=0.83 secs] 
Heap after GC invocations=11 (full 4):
PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}
Heap
PSYoungGen      total 5590528K, used 4343344K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 5588992K, 77% used [0x000000066ab00000,0x0000000773c8c0c0,0x00000007bfd00000)
from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
Metaspace       used 3924K, capacity 4568K, committed 4864K, reserved 1056768K
class space    used 430K, capacity 460K, committed 512K, reserved 1048576K

It can be seen that the old generation occupies 4195328K, which is the size of the character array before expansion, and the young generation eden area is 4194304K, which is the size of the new array with a length of 2147483646 after expansion. Therefore, after several times of Full GC finishing on the 16G heap, it can finally be realizedStringBuilderThe top grid expansion.

It should be noted that ⚠️: there is only one in the demoStringBuilder, there are bound to be many other objects in the actual environment, so there is basically no
StringBuilderThe case of top grid expansion. more inStringBuilderWhen expanding, it may be accompanied by the allocation of other objects, resulting in
Arrays#copyOfMethod throws OOM.

OOM information:Requested array size exceeds VM limit, this is actually an OOM exception thrown before the memory is actually allocated:
A weird Kafka message OOM problem, take the opportunity to pick up StringBuilder

It means that before the memory is actually allocated, the check finds that the length of the array is out of bounds.
About the array out of bounds: I was lazy and directly debugged the hotspot source code. The maximum length of the array under 64-bit jdk8 is Integer.MAX_VALUE-2, which is 2147483645 [operating systems with different digits, different array element types, and whether to enable pointer compression, etc. Both will lead to changes in the maximum available length of the array (max_array_length)].

in conclusion:

  1. StringBuilderWhen expanding, the character array will be copied, and the length of the array is less thanInteger.MAX_VALUEIn half of the cases, a new array will be created for copying, and the length of the new array is twice the original array➕2 [memory usage at this time: original array size➕2*original array size];
  2. If the array exceedsInteger.MAX_VALUEHalf of the latest expansion, the new array length directly uses MAX_ARRAY_SIZE (Integer.MAX_VALUE – 8);
  3. In the case of the default ratio of 16G heap memory, the character array can be successfully expanded toInteger.MAX_VALUELength (⚠️ Note that the demo here is onlyStringBuilderobject, there must be many other objects in the actual situation, so there are very few StringBuilders in the actual environment that can achieve top-level expansion);
  4. In the process of expansion, if the size of the new array cannot be allocated in the eden area, a Young GC will be performed. If it still cannot be allocated, it will be directly allocated in the old generation; Full GC], after that, if the allocation is still not enough, OOM (the place where the throw isjava.util.Arrays#copyOf(char[], int)ofnew char[newLength]when creating a new array);

5. Final Analysis

Through the above experiments and analysis, the following conclusions can be roughly drawn for the OOM problem on the customer line:

  1. OOM is to print error log buildStringBuilderthrown when
  2. StringBuilderThe expansion did not reach the size of Integer.MAX_VALUE, becausejava.util.Arrays#copyOf(char[], int)When OOM was thrown, the memory size of the old original array was only 73728K (at this time, the size of the new array was 147456K at the time of expansion), which was far below the size of Integer.MAX_VALUE.
  3. If the message is too large, the sending of kafka message will fail, which will cause KafkaProducer to be shut down and then recreated [the logic of DefaultKafkaProducerFactory]. continue to keepRecordAccumulatorIf there are too many instances, it is necessary to continue in-depth analysis through the source code;
  4. The reason for OOM is that along with the memory usage of other objects in the heap memory, inStringBuilderUnable to allocate new array objects when expanding the array copy [Need to combine GC log for in-depth analysis];

Recommended Today

How to use jwt to complete the logout (logout) function

original Magical JSON Web Tokens (JWT) JSON Web Tokens (JWT) are a stateless way of handling user authentication. What’s the meaning?JWT helps to establish authentication mechanism without storing authentication state in any storage, be it session memory or database, therefore, when checking user’s authentication state, there is no need to access session memory or perform […]