Software performance test analysis and tuning practice – Performance Analysis and tuning of Java applications – excerpts from manuscripts


Since its birth, Java programming language has become a very popular programming language, covering many technical fields such as the Internet, Android applications, back-end applications, big data and so on. Therefore, the performance analysis and tuning of Java applications is also a very important topic. The performance of Java applications is directly related to the access carrying capacity of many large e-commerce websites and the data processing capacity of big data. Its performance analysis and tuning can often save a lot of hardware costs.

5.1 # JVM Basics

5.1.1} JVM introduction

JVM is the English abbreviation of Java virtual machine, which is realized by simulating various computer functions on an actual computer. After the introduction of Java virtual machine, Java programming language makes Java applications run on different operating system platforms without recompiling again. The Java programming language shields the information related to the specific operating system platform by using the Java virtual machine to ensure the platform compatibility of the compiled application, so that the Java application can be deployed and run on different operating systems by compiling and generating the object code (bytecode) running on the Java virtual machine. In essence, Java virtual machine can be regarded as a program and process running on the operating system. After starting, the Java virtual machine starts to execute the instructions saved in the bytecode file. Its internal structure is shown in figure 5-1-1.


Figure 5-1-1

In jdk1 8 (Java 8) and later versions, some small changes have taken place in the internal composition structure of the JVM, as shown in figure 5-1-2.


Figure 5-1-2

5.1.2 class II loader

The class loader is responsible for the compiled The class bytecode file is loaded into memory so that the JVM can instantiate or otherwise use the loaded class. Class loader supports dynamic loading at runtime. Dynamic loading can save memory space and flexibly loadLocal orWhen loading classes on the network, the separation of classes can be realized through the separation of namespaces, which enhances the security of the whole system. Class loaders are divided into the following types:

L bootstrap class loader: the bootstrap class loader is the bottom loaderIt is implemented by C / C + + language and non Java language, responsible for loading all Java bytecode files in rt.jar file in JDK. As shown in figure 5-1-3, the rt.jar file is generally located in the JRE directory of the JDK, which stores the core bytecode file of the Java language itself. Java’s own core bytecode files are generally loaded by the startup class loader.


Figure 5-1-3

L extension class loader: it is responsible for loading some jar packages of extension functions into memory. Generally responsible for loading/Lib / ext directory or by the system variable – DJava Ext.dir specifies the bytecode file in the location.

L system class loader: responsible for the system classpath Java – classpath or – DJava class. The bytecode class library in the directory specified by the path parameter is loaded into memory. Usually, Java programs written by programmers themselves are also loaded by this kind of loader.

The process of loading classes by the class loader is shown in figure 5-1-4, which also describes the whole life cycle of a class bytecode file.


Figure 5-1-4

Author: Zhang Yongqing, please indicate: From blog Garden

The detailed description of class loader loading process is shown in Table 5-1.

Table 5-1 detailed description of loading process of class # loader




The specified Load the calss bytecode file into the JVM


takealreadyClass data of binary byte stream loaded into JVMetc.The loading process includes three steps: verification, preparation and parsing


Check The correctness of the class bytecode file ensures that the file conforms to the specification definition and is suitable for the current JVM version. It generally includes the following four sub steps:

(1) File format verification: verify whether the format of bytecode file conforms to the specification, whether the version number is correct, whether the corresponding version can be supported by the current JVM, whether the constants in the constant pool have unsupported types, etc.

(2) Metadata verification: semantic analysis of the information described by bytecode to ensure that the information described conforms to the specification of Java language.

(3) Bytecode verification: by analyzing the data flow and control flow of bytecode file, verify that the semantics of the code is legal and consistent with JavaLanguage programmingcanonical.

(4) Symbol reference verification: symbol reference refers to a group of symbols to describe the referenced target, and verify whether the symbol reference is transformed into a real memory address


Allocate memory for the class loaded into the JVM and initialize the initial value of the static variable in the class


To convert a symbolic reference to a direct reference is generally to resolve the symbolic reference in the constant pool of a class to a direct reference


Initialize the static variables in the class and execute the static code in the classblock, constructors, etc. If there is no constructor, the system adds the default parameterless constructor. If the constructor of the class does not show the constructor of the calling parent class, the compiler will automatically generate a parameterless constructor of the parent class


Refers to being used at runtime


Refers to removing a class from the JVM

5.1.3 Java virtual machine stack and local method stack

Java virtual machine stack is the memory model of Java method execution. It is thread private and directly related to threads. Each time a new thread is created, the JVM assigns a corresponding Java stack to the thread. The memory area of the Java stack of each thread cannot be accessed directly to each other to ensure the safety of the thread during concurrent operation. Every time a method is called, the Java virtual machine stack will generate a stack frame for each method. When the method is called, press the stack frame (usually called stack), and when the method returns, pop up the stack frame and discard it (usually called stack). The stack frame stores local variables, operand stacks, dynamic links, intermediate operation results, method return values and other information. The process of each method being called and completed corresponds to the process of putting a stack frame into and out of the virtual machine stack. The life cycle of the virtual machine stack is the same as that of the thread. The local variables stored in the stack frame end with the end of the thread.

The local method stack is similar to the Java virtual machine stack, which mainly stores local methodsNamelyNative method refers to(modified with native keyword)The status and information of the call is to facilitate the JVM to call local methods(native method)And interface stack area.

Common stack related exceptions are as follows:

Lstackoverflow error: commonly known as stack overflow. Generally, this error occurs when the stack depth exceeds the stack size allocated by the JVM virtual machine to the thread. When the method is called in a loop and cannot exit, it is prone to stack overflow errors.

Outofmemoryerror: the detailed error information is generally “exception in thread” main “java.lang.outofmemoryerror: unable to create new native thread”. The memory size of the Java virtual machine stack allows dynamic expansion, and when the thread requests the stack, the memory runs out and cannot be expanded dynamically. At this time, an outofmemoryerror error is thrown.

5.1.4} method area and metadata area

Author: Zhang Yongqing, please indicate: From blog Garden

The method area is what we often call the permanent generation area. It stores Java class information, constant pool, static variables and other data. The memory area occupied by the method area is shared by threads in the JVM. In jdk1 In 8 and later versions, the method area has been removed and replaced by the metadata area and local memory. The metadata information of the class is directly stored in the local memory managed by the JVM. It should be noted that the local memory is not part of the virtual machine runtime data area, nor is it the memory area defined in the Java virtual machine specification. Constant pool, static variables and other data are stored in the Java heap. The main purpose of this is to reduce the problem of full GC caused by too many loaded classes.

5.1.5 # stacking area

Java is an object-oriented development language, and the JVM heap is the memory area that really stores Java object instances and is shared by all threads. Therefore, Java programs need to solve synchronization and thread safety problems when instantiating objects and other operations. Java heap area can be subdivided into Cenozoic area and old age area. The Cenozoic can also be subdividedhaveThey are Eden space area, from survivor space area and to survivor space area, as shown in figure 5-1-5. Heap is the memory area where GC garbage collection occurs most frequently, so it is also a key area for JVM performance tuning.


Figure 5-1-5

The internal structure of Java heap is shown in table 5-2.

Table 5-2 # description of internal structure of Java heap



Cenozoic area

Also known as the younger generation area, it is composed of Eden space area and survivor space area. In the Cenozoic region, the default memory allocation ratio of the JVM is Eden: from survivor: to survivor = 8:1:1

Eden space area

The memory area where the new object is stored holds the object instance created for the first time

Survivor space area

It is composed of from survivor space area and to survivor space area, and one of the two areas is always empty

From survivor space area

Store the object instances that survived GC garbage collection in Eden space area. The functions of the from survivor space area and the to survivor space area are equivalent, and the size of the two areas is the same by default

To survivor space area

Store the object instances that survived GC garbage collection in Eden space area. When a survivor space is saturated, the surviving object will be moved to another survivor space, and then the saturated survivor space will be emptied

Old age area

The JVM’s garbage collector performs garbage collection in generations. After recycling for a certain number of times (which can be set through JVM parameters), the surviving new generation object instances will enter the old age area

The direction indicated by the arrow in figure 5-1-5 above represents the movement process of data during generation specific garbage collection of JVM heap. Objects are saved in Eden space area just after being created, and those long-lived objects will be transferred to old generation through survivor space. Of course, for some large objects (a large continuous memory space needs to be allocated), they directly enter the old age area. This usually happens when the memory in the survivor area is insufficient.

In jdk1 7 and earlier versions, the composition of the shared memory area of the JVM is shown in figure 5-1-6.

Author: Zhang Yongqing, please indicate: From blog Garden


Figure 5-1-6

In jdk1 8 and later versions, the composition of the shared memory area of the JVM is shown in figure 5-1-7.


Figure 5-1-7

5.1.6} program counter

The program counter is an indicator that records the location of bytecode instructions executed by the thread and is loaded into the JVM memory Class bytecode files are interpreted and executed by bytecode interpreter, and bytecode instructions are read in order. After each instruction is read, the instruction is converted into corresponding operations, and branch, cycle, condition judgment and other process processing are carried out according to these operations. Because the program is generally executed by multiple threads, and the multi threads of the JVM rotate through the CPU time slice (that is, the threads switch and execute in turn)distributionFair competitionCPU execution time) algorithm, so that one thread may be suspended due to the depletion of time slice during execution, while another thread obtains the time slice and starts execution. When the suspended thread gets the CPU time slice again, if it wants to continue execution from the suspended place, it must know where it last executedThat is, the specific line number in the codeIn the JVM, the program counter is used to record the execution position of bytecode instructions of a thread. Therefore, the program counter is thread private and thread isolated. Each thread has its own program counter at runtime. In addition, if the native method is executed, the value of the program counter is null, because the native method is executed by Java directly calling the Java local C / C + + language library through JNI (Java Native Interface), and the method implemented in C / C + + language naturally cannot produce corresponding Class bytecode (C / C + + language is executed in the way of C / C + + language), so the program counter of Java has no value at this time.

5.1.7 waste recycling

Java language is different from other programming languages. The memory recycling during program running does not require the developer to manually recycle and release in the code, but the JVM automatically recycles the memory. During memory recycling, object instances that are no longer used will be removed from memory to free up more memory space. This process is often referred to as JVM garbage collection mechanism.

Garbage recycling is generally called GC, the new generation of garbage recycling is generally called minor GC, and the old generation of garbage recycling is generally called major GC or full GC. Garbage collection is so important because it is usually accompanied by the suspension of the application. Generally, when garbage collection occurs, except for the threads required by GC, all other threads enter the waiting state until GC execution is completed. The main goal of GC tuning is to reduce the pause time of applications.

Common algorithms of JVM garbage collection include root search algorithm, mark clear algorithm and copy algorithmandMarking sorting algorithmandIncremental recovery algorithm

1. Root search algorithm

The root search algorithm regards the garbage collection thread as a graph of all the reference relationships of the application, and starts from a node GC root (English interpretation is a garbage collection root is an object that is accessible from outside the heap, that is, an object that can be accessed from outside the heap). After finding this node, continue to find the reference node of this node. When all reference nodes are found, the remaining nodes are considered as nodes that are not referenced, that is, useless nodes, and then garbage collection is performed on these nodes.

As shown in figure 5-1-8, nodes with darker colors (instance object 6, instance object 7 and instance object 8) are nodes that can be garbage collected because these nodes have been referenced.


Figure 5-1-8

Author: Zhang Yongqing, please indicate: From blog Garden

IBM website page diagnostics. memory. analyzer. doc/gcroots. In the HTML introduction, the objects that can be used as GC root nodes in the JVM include:

System[z8] class[u9]

A class that was loaded by the bootstrap loader, or the system class loader. For example, this category includes all classes in the rt.jar file (part of the Java™ runtime environment), such as those in the java.util.* package.

JNI local

A local variable in native code, for example user-defined JNI code or JVM internal code.

JNI global

A global variable in native code, for example user-defined JNI code or JVM internal code.

Thread block

An object that was referenced from an active thread block.


A running thread.

Busy monitor

Everything that called the wait() or notify() methods, or that is synchronized, for example by calling the synchronized(Object) method or by entering a synchronized method. If the method was static, the root is a class, otherwise it is an object.

Java local

A local variable. For example, input parameters, or locally created objects of methods that are still in the stack of a thread.

Native stack

Input or output parameters in native code, for example user-defined JNI code or JVM internal code. Many methods have native parts, and the objects that are handled as method parameters become garbage collection roots. For example, parameters used for file, network, I/O, or reflection operations.


An object that is in a queue, waiting for a finalizer to run.


An object that has a finalize method, but was not finalized, and is not yet on the finalizer queue.


An object that is unreachable from any other root, but was marked as a root by Memory Analyzer so that the object can be included in an analysis.

Unreachable objects are often the result of optimizations in the garbage collection algorithm. For example, an object might be a candidate for garbage collection, but be so small that the garbage collection process would be too expensive. In this case, the object might not be garbage collected, and might remain as an unreachable object.

By default, unreachable objects are excluded when Memory Analyzer parses the heap dump. These objects are therefore not shown in the histogram, dominator tree, or query results. You can change this behavior by clicking File > Preferences… > IBM Diagnostic Tools for Java – Memory Analyzer, then selecting the Keep unreachable objects check box.

and The explanation given in the website is shown in figure 5-1-9.


Figure 5-1-9

Finally, we summarize as follows:

(1) The instance object referenced in the JVM virtual machine stack.

(2) The object referenced by the static attribute in the method area (only for JVMs before JDK1.8. Since there is no method area after JDK1.8, the static attribute is directly stored in heap).

(3) The object referenced by the static constant in the method area (only for JVMs before JDK1.8. Since there is no method area after JDK1.8, the static constant is directly stored in heap).

(4) The object referenced in the stack of native methods (mostly used in JNI interface calls).

(5) Objects held by the JVM itself, such as startup class loader, system class loader, etc.

Other GC algorithms mentioned below will basically refer to the concept of root search algorithm.

Author: Zhang Yongqing, please indicate: From blog Garden

2. Mark clear algorithm

As shown in figure 5-1-10, the mark clear algorithm scans from GC root to mark the surviving object nodes. After marking, scan the unmarked objects in the whole memory area for direct recycling. Since the mark clear algorithm will not move and defragment the surviving objects after marking, it is easy to cause memory fragmentationNamelyIdle continuousMemory spaceSmaller than the space to apply for,causeof large numberIdleSmall memory blocks cannot be utilized。 However, because only non surviving objects are processed, when there are more surviving objects and fewer non surviving objects, the performance of tag removal algorithm is very high.


Figure 5-1-10

3. Replication algorithm

Replication algorithmsameAdopt fromGC ROOTRoot setScanning: copy the surviving objects to the idle area. After scanning the active area, all the memory in the active area will be recycled at one time. At this time, the original active area will become the idle area, as shown in figure 5-1-11. The replication algorithm divides the memory into two sections. All dynamically allocated instance objects can only be allocated in one section (at this time, the section becomes an active section), while the other section is idle. This operation is repeated every time during GC, and one area is always idle every time.


Figure 5-1-11

4. marking sorting algorithm

Mark and clear objects in the same way as the mark and clear algorithm, but after recycling the occupied space of non viable objectsMemoryAfter space, all surviving objects will be moved to the free space at the left end, and the corresponding memory node pointer will be updated, as shown in figure 5-1-12. The mark and sort algorithm is based on the mark and clear algorithm, and also carries out the moving, sorting and sorting of objects. Although the performance cost is higher, it solves the problem of memory fragmentation. If the problem of memory fragmentation is not solved, once a large object instance needs to be created, the JVM may not be able to allocate continuous large memory to the large instance object, resulting in full GC. In garbage collection, full GC should be avoided as much as possible, because once full GC occurs, the application will generally pause for a long time to wait for full GC to complete.

Author: Zhang Yongqing, please indicate: From blog Garden


Figure 5-1-12

In order to optimize the performance of garbage collection, the JVM uses generational collection. It mainly adopts the replication algorithm for the recycling of the new generation memory (minor GC), while most of the recycling of the old age (major GC / full GC) adopts the tag collation algorithm. When optimizing garbage collection, the most important thing is to reduce the number of garbage collection in the old age, because the garbage collection of the old generation takes a long time, the performance cost is very high, and has a great impact on the operation of the application.

4. Incremental recoveryalgorithm

incrementThe recycling algorithm divides the JVM memory space into multiple regions, only forThe advantage of garbage collection in one area is to reduce the interruption time of the application, so that users are generally unaware of garbage collectionThe actuator is working

5.1.8 parallelism and concurrency

Author: Zhang Yongqing, please indicate: From blog Garden

Parallelism and concurrency are often mentioned in concurrent program development. The difference between parallelism and concurrency in garbage collection is as follows:

L parallelism: the JVM starts multiple garbage collection threads to work in parallel, but at this time, the user thread (the working thread of the application) needs to be in the waiting state all the time.

L Concurrency: it refers to that the user thread (the working thread of the application) and the garbage collection thread execute at the same time (but not necessarily in parallel, and may execute alternately). At this time, the user thread can continue to run, while the garbage collection thread runs on another CPU core and can not interfere with each other.

To be continued,Author: Zhang Yongqing, please indicate: From blog Garden. This article is excerpted fromSoftware performance test analysis and tuning practice