A practical service exception handling Guide

Time:2020-8-2
1. Service exception handling process

A practical service exception handling Guide

2. Load

2.1 check the load of the machine CPU

top -b -n 1 |grep java|awk '{print "VIRT:"$5,"RES:"$6,"cpu:"$9"%","mem:"$10"%"}'

2.2 finding threads with high CPU utilization

top -p 25603 -H
printf 0x%x 25842
jstack 25603 | grep 0x64f2

cat /proc/interrupts

(1)CPU
(2)Memory
(3)IO
(4)Network

The CPU information can be monitored from the following aspects:
(1) Interruption;
(2) Context switching;
(3) Runnable queue;
(4) CPU utilization.

3. Memory

3.1 system memory

Free command

[[email protected] ~]# free
total used free shared buffers cached
Mem: 3266180 3250000 10000 0 201000 3002000
-/+ buffers/cache: 47000 3213000
Swap: 2048276 80160 1968116

The default display unit here is KB.

Explanation of indicators

  • Total: the total size of physical memory.
  • Used: how big has been used.
  • Free: how many are available.
  • Shared: the total amount of memory shared by multiple processes.
  • Buffers: the size of the disk cache.
  • Cache: the size of the disk cache.
  • -/+Buffers / cached: used: used: how large has been used; free: how much is available.
  • Used memory = system used memory – buffers – cached
    (47000 = 3250000-201000-3002000)
  • Available memory = system free memory + buffers + cached
    (3213000 = 10000+201000+3002000)

What is buffer / cache?

  • Buffer refers to Linux memory: buffer cache, buffer buffer
  • Cache refers to the page cache in Linux memory

page cache

Page cache is mainly used as the file data cache on the file system, especially when the process has read / write operations on the file.

If you think about it carefully, as a system call that can map files to memory: is MMAP a natural way to use page cache? In the current system implementation, page cache is also used as a cache device for other file types, so in fact, page cache is also responsible for most block device file caching.

buffer cache

Buffer cache is mainly used in the system to read and write to the block device, and the system to cache the block data is used. This means that some operations on blocks will be cached with buffer cache, such as when we are formatting the file system.

Generally, the two cache systems are used together. For example, when we write a file, the content of page cache will be changed, while the buffer cache can be used to mark page as different buffers and record which buffer has been modified. In this way, the kernel does not need to write back the whole page, but only needs to write back the modified part.

In the current kernel, page cache is for the cache of memory pages. To put it bluntly, if any memory is allocated and managed by page, page cache can be used as its cache for management and use.

Of course, not all memory is managed by page, but also many are managed by blocks. If the cache function is needed, it is centralized in the buffer cache. From this point of view, is it better to rename buffer cache to block cache However, not all blocks have a fixed length. The length of the block on the system is mainly determined by the block device used, while the page length on X86 is 4K whether it is 32-bit or 64 bit.

3.2 process memory

3.2.1 process memory statistics

/proc[pid]status
Through / proc / status, you can view the memory usage of a process, including virtual memory size (vmsize), physical memory size (vmrss), data segment size (vmdata), stack size (vmstk), code segment size (vmexe), shared library code segment size (vmlib), etc.

Name: the program name of the GEDIT / * process*/
State: s (sleeping) / * process status information. See http://blog.chinaunix.net/u2/73528/showart_ 1106510.html*/
TGID: 9744 / * thread group number*/
PID: 9744 / * process PID * / PPID: 7672 / * PID of parent process*/
Tracerpid: 0 / * PID of tracking process*/
Vmpeak: 60184 KB / * size of process address space*/
Vmsize: 60180 KB / * size of process virtual address space reserved_ VM: the physical pages of a process between reserved or special memory*/
Vmlck: 0 KB / * the size of the locked physical memory of the process. The locked physical memory cannot be exchanged to the hard disk*/
Vmhwm: 18020 KB / * size of file memory map and anonymous memory map*/
Vmrss: 18020 KB / * the size of the physical memory the application is using is the value (RSS) of the parameter RSS of the PS command*/
Vmdata: 12240 KB / * the size of the program data segment (occupied by the virtual memory) to store the initialized data*/
Vmstk: 84 KB / * stack size of process in user mode*/
Vmexe: 576 KB / * the size of executable virtual memory owned by the program, code segment, excluding the library used by the task*/
Vmlib: 21072 KB / * the size of the library mapped to the virtual memory space of the task*/
Vmpte: 56 KB / * the size of all page tables of this process*/
Threads: 1 / * the number of tasks sharing the signal descriptor*/

3.2.2 JVM memory allocation

Heap and heap memory

According to the official saying: “the Java virtual machine has a heap, which is the runtime data area from which all class instances and arrays of memory are allocated. The heap is created when the Java virtual machine is started. ” “Memory outside the heap in the JVM is called non heap memory.”.

As you can see, the JVM manages two types of memory: heap and non heap.

Simply put, the heap is the memory available to Java code and is reserved for developers; non heap is the memory reserved by the JVM for its own use.

So the method area, the memory required for internal processing or optimization within the JVM (such as the JIT compiled code cache), each class structure (such as the runtime constant pool, fields and method data), and the code for methods and constructors are all in non heap memory.

  1. The memory required by the JVM itself, including the third-party libraries it loads and the memory allocated by those libraries
  2. NiO’s directbuffer is the allocated native memory
  3. Memory mapping files, including some jars and third-party libraries loaded by the JVM, as well as those used inside the program. In the output of PMAP above, there are some static files whose size is not in the Java heap. Therefore, as a web server, please remove the static files from the web server and put them into nginx or CDN.
  4. JIT, the JVM will compile the class into native code, and this memory will not be less. If spring’s AOP is used, cglib will generate more classes, and JIT’s memory overhead will also increase. Moreover, the class itself will be put into perm generation by the GC of the JVM, which is difficult to recycle. In this case, the JVM should be allowed to use concurrentmarksweep GC, and enable the relevant parameters of this GC to remove unused classes from perm generationParameter configuration:
-20: + useconcmarksweepgc - X: + cmspermgensweepingenabled - X: + cmsclassunloadingenabled. If you don't need to remove it and perm generation space is not enough, you can increase it a bit: - X: permsize = 256M - X: maxpermsize = 512M
  1. JNI, some native libraries called by JNI interface will also allocate some memory. In case of memory leakage of JNI library, Valgrind and other memory leak tools can be used to detect it
  2. Thread stack. Each thread will have its own stack space. If there are more than one thread, the overhead will be obvious
  3. Jmap / jstack sampling, frequent sampling will also increase memory consumption. If you have server health monitoring, remember not to use too high a frequency, otherwise health monitoring will become pathogenic monitoring.

1. Method area

Also known as “permanent generation” and “non heap”, it is used to store the class information, constants, static variables loaded by the virtual machine, and is the memory area shared by each thread. The default minimum value is 16 MB and the maximum value is 64 MB. The size of method area can be limited by – XX: permsize and – XX: maxpermsize parameters.

Runtime constant pool: it is a part of the method area. In addition to the description information of the class version, field, method and interface, there is also a constant pool in the class file, which is used to store various symbol references generated by the compiler. This part of the content will be put into the runtime constant pool of the method area after the class is loaded.

2. Virtual machine stack
It describes the memory model of Java method execution: when each method is executed, a “stack frame” will be created to store local variable table (including parameters), operation stack, method exit and other information.

Each method is called to the end of the execution process, corresponding to a stack frame in the virtual machine stack from the stack to the stack. The declaration cycle is the same as thread and is thread private.

The local variable table stores all kinds of basic data types (Boolean, byte, char, short, int, float, long, double) known by the compiler, and object references (reference pointer, not the object itself). Among them, 64 bit long and double data types will occupy the space of two local variables, and the remaining data types only take up one.

The memory space required by the local variable table is allocated during compilation. When a method is entered, how many local variables the method needs to allocate in the stack frame is completely determined. During the runtime, the stack frame will not change the size space of the local variable table.

3. Local method stack
It is basically similar to the virtual machine stack, except that the virtual machine stack serves the Java methods executed by the virtual machine, while the local method stack serves the native method.

4. Pile
It is also called Java heap and GC heap. It is the largest memory area in the memory managed by Java virtual machine. It is also the memory area shared by each thread. It is created when the JVM starts.

This memory area holds object instances and arrays (all new objects). Its size is set by the – XMS (minimum value) and – Xmx (maximum value) parameters. The – XMS is the minimum memory requested when the JVM starts, and is 1 / 64 of the operating system’s physical memory by default, but less than 1G;

-Xmx is the maximum memory that can be applied by the JVM. By default, it is 1 / 4 of the physical memory but less than 1G. By default, when the empty heap memory is less than 40%, the JVM will increase the heap to the size specified by – Xmx. This ratio can be specified by – XX: minheapfreeration =;

When the empty heap memory is greater than 70%, the JVM will reduce the size of the heap to the size specified by – XMS. This ratio can be specified by XX: maxheapfreeration =. For the operating system, in order to avoid frequent adjustment of the heap size at runtime, the values of – XMS and – Xmx are usually set to the same.

Since the current collectors adopt generational collection algorithm, the heap is divided into new generation and old generation. The new generation mainly stores newly created objects and objects that have not entered the old age. The older generation stores objects that have survived after several generations of mini GC.

5. Program counter
In the virtual machine model, bytecode interpreter works by changing the value of this counter to select the next bytecode instruction to be executed. Branches, loops, exception handling, thread recovery and other basic functions need to be completed by counters.

3.2.3 direct memory

Direct memory is not part of virtual machine memory, nor is it an area of memory defined in the Java virtual machine specification. The NiO newly added in jdk1.4 introduces the IO mode of channel and buffer. It can call the native method to allocate the out of heap memory directly. This out of heap memory is the local memory and will not affect the size of heap memory.

3.2.4 JVM memory analysis

View JVM heap memory
jmap -heap [pid]

[[email protected] ~]$ jmap -heap 837

Attaching to process ID 837, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.71-b01
using thread-local object allocation.
Parallel GC with 4 thread (s) // GC mode
Heap configuration: // heap memory initialization configuration
Minheapfreeratio = 0 // corresponding to the JVM startup parameter - XX: minheapfreeratio sets the minimum idle ratio of the JVM heap (default 40)
Maxheapfreeratio = 100 // corresponding to the JVM startup parameter - XX:
Maxheapfreeratio sets the maximum idle ratio of the JVM heap (default 70)
Maxheapsize = 2082471936 (1986.0mb) // corresponding to the JVM start parameter - XX:
Maxheapsize = sets the maximum size of the JVM heap
Newsize = 1310720 (1.25mb) // corresponding to the JVM startup parameter - XX: newsize = set the default size of the "new generation" of the JVM heap
Maxnewsize = 17592186044415 MB // corresponding to the JVM startup parameter - XX: maxnewsize = sets the maximum size of the "new generation" of the JVM heap
Oldsize = 5439488 (5.1875mb) // the corresponding JVM startup parameter - XX: oldsize = < value >: sets the size of the "old generation" of the JVM heap
Newratio = 2 // corresponding to the JVM startup parameter - XX: newratio =: the size ratio of 'new generation' and 'old generation'
Survivorratio = 8 // corresponding to the JVM startup parameter - XX: survivorratio = set the size ratio of Eden and survivor regions in young generations
Permsize = 21757952 (20.75mb) // corresponding to the JVM startup parameter - XX: permsize = < value >: set the initial size of the "immortal" of the JVM heap
Maxpermsize = 85983232 (82.0mb) // corresponding to the JVM startup parameter - XX: maxpermsize = < value >: sets the maximum size of the JVM heap's "immortal generation"
G1HeapRegionSize = 0 (0.0MB)
Heap usage // heap memory usage
PS Young Generation
Eden  Space://Eden Area memory distribution
Capacity = 33030144 (31.5mb) // total capacity of Eden District
Used = 1524040 (1.4534378051757812mb) // used in Eden District
Free = 31506104 (30.0465621948 2422mb) // remaining capacity of Eden District
4.614088270399305% used / Eden area utilization ratio
From space: // the memory distribution of one of the survivor regions
capacity = 5242880 (5.0MB)used = 0 (0.0MB)free = 5242880 (5.0MB)
0.0% used
To space: // memory distribution of another survivor area
capacity = 5242880 (5.0MB)
used = 0 (0.0MB)free = 5242880 (5.0MB)
0.0% used
PS old generation // current old area memory distribution
capacity = 86507520 (82.5MB)
used = 0 (0.0MB)
free = 86507520 (82.5MB)
0.0% used
PS perm generation // current "Immortality" memory distribution
capacity = 22020096 (21.0MB)
used = 2496528 (2.3808746337890625MB)
free = 19523568 (18.619125366210938MB)
11.337498256138392% used
670 interned Strings occupying 43720 bytes.

I won’t go into details about several generation materials on the Internet. If we calculate the sum here, we can know that the former allocates 644M of memory to the Java environment, while the output vsz and RSS of PS are 7.4g and 2.9g respectively. What is the matter?

In the previous jmap output, maxheapsize is configured on the command line, – xmx4096m, the maximum heap memory that this Java program can use.

Vsz refers to the allocated linear space size, which is usually not equal to the memory size actually used by the program. There are many possibilities for this, such as memory mapping, shared dynamic library, or applying for more heaps from the system, which will expand the linear space size. To see which memory mappings a process has, you can use the PMAP command to view:

pmap -x [pid]

[[email protected] ~]$ pmap -x 837
837: java

Address Kbytes RSS Dirty Mode Mapping
0000000040000000 36 4 0 r-x-- java
0000000040108000 8 8 8 rwx-- java
00000000418c9000 13676 13676 13676 rwx-- [ anon ]
00000006fae00000 83968 83968 83968 rwx-- [ anon ]
0000000700000000 527168 451636 451636 rwx-- [ anon ]
00000007202d0000 127040 0 0 ----- [ anon ]
......
00007f55ee124000 4 4 0 r-xs- az.png
00007fff017ff000 4 4 0 r-x-- [ anon ]
fffffffff600000 4 0 0 r-x-- [ anon ]
---------------- ------ ------ ------
total kB 7796020 3037264 3023928

You can see a lot of anons here, which means that this memory is allocated by MMAP.

Rsz is the resident set size, which is the physical memory size actually occupied by the process. In this example, the difference between the rsz and the actual heap memory is 2.3g. The memory composition of the 2.3g is as follows:

View the memory of each partition of the JVM heap

jstat -gcutil [pid]

[[email protected] ~]$ jstat -gcutil 837 1000 20
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 80.43 24.62 87.44 98.29 7101 119.652 40 19.719 139.371
0.00 80.43 33.14 87.44 98.29 7101 119.652 40 19.719 139.371

Analyzing objects in JVM heap memory
View live object statistics
jmap -histo:live [pid]

Dump memory
jmap -dump:format=b,file=heapDump [pid]

Then use the jhat command to see
jhat -port 5000 heapDump
Access in browser: http://localhost : 5000 / View Details

4. Service indicators

4.1 response time (RT)

Response time refers to the time when the system responds to the request. It’s very consistent with the overall performance of the software.

Because a system usually provides many functions, and the processing logic of different functions is also very different, so the response time of different functions is not the same, and even the response time of the same function in the case of different input data is not the same.

Therefore, when discussing the response time of a system, people usually refer to the average time of all functions or the maximum response time of all functions of the system.

Of course, it is often necessary to discuss the average response time and maximum response time for each or each group of functions.

For a single application system without concurrent operation, response time is generally considered to be a reasonable and accurate performance index. It should be pointed out that the absolute value of the response time can not directly reflect the performance of the software. The performance of the software actually depends on the user’s acceptance of the response time.

For a game software, the response time less than 100 ms should be good, the response time in about 1 second may be barely acceptable, if the response time reaches 3 seconds, it is completely unacceptable.

For compiling system, it may take tens of minutes or even longer to compile the source code of a large-scale software, but these response times are acceptable to users.

4.2 throughput

Throughput is the number of requests processed by the system per unit time. For non concurrent application systems, throughput and response time are inversely proportional. In fact, throughput is the reciprocal of response time.

As mentioned above, for a single user system, response time (or system response time and application delay time) can be a good measure of system performance, but for concurrent systems, throughput is usually used as a performance indicator.

For a multi-user system, if only one user uses the system, the average response time of the system is t. when there are n users using the system, the response time seen by each user is usually not n × T, but often much smaller than n × t (of course, in some special cases, it can be larger or even much larger than n × T).

This is because a lot of resources are needed to process each request. Because there are many problems in the process of processing each request, it is difficult to execute concurrently, which leads to the fact that at a specific point in time, the resources occupied are not much. In other words, when processing a single request, many resources may be idle at each time point. When processing multiple requests, if the resource allocation is reasonable, the average response time seen by each user does not increase linearly with the number of users.

In fact, the average response time of different systems increases with the increase of users, which is also the main reason for using throughput to measure the performance of concurrent systems.

Generally speaking, throughput is a general index. If the maximum throughput of two systems with different number of users and user usage patterns is basically the same, it can be judged that the processing capacity of the two systems is basically the same.

4.3 number of concurrent users

The number of concurrent users refers to the number of users that the system can carry at the same time and use the system functions normally. Compared with throughput, the number of concurrent users is a more intuitive but more general performance indicator.

In fact, the number of concurrent users is a very inaccurate indicator, because different usage patterns of users will cause different users to make different numbers of requests per unit time.

Take a website system as an example, assuming that users can only use it after registration, but registered users are not using the website all the time. Therefore, only some registered users are online at the same time at a specific time, and online users will spend a lot of time reading the information on the website when browsing the website, so only some online users send requests to the system at the same time.

In this way, for the website system, we will have three statistics about the number of users: the number of registered users, the number of online users and the number of users who send requests at the same time. Because the registered users may not log on the website for a long time, using the number of registered users as the performance index will cause great error. The number of online users and the number of requests from colleagues can be used as performance indicators.

In contrast, it is more intuitive to take online users as the performance index, and the number of users who send requests at the same time as the performance index is more accurate.

4.4 QPS query per second

Query rate per second (QPS) is a measure of how much traffic a particular query server processes in a given time. On the Internet, the performance of a machine as a DNS server is often measured by the query rate per second. Corresponding to fetches / sec, that is, the number of response requests per second, that is, the maximum throughput capacity.

From the above concepts, throughput and response time are important indicators to measure system performance. Although QPS is different from throughput, it should be in direct proportion. Any index can contain the parallel processing capacity of the server. Of course, throughput is more concerned with the amount of data, and QPS is more concerned with the number of transactions processed.

4.5 CPU utilization

CPU load average < CPU cores 0.7

Context Switch Rate
It is the process (thread) switching. If there are too many handoffs, the CPU will be busy switching, which will also affect the throughput.

Section 2 of “high performance server architecture” is about this issue.

How much is appropriate? Google has been around for a long time without a definite explanation.

Context switch is generally composed of two parts: interrupt and process (including thread) switching. An interrupt will cause a switch, and process (thread) creation and activation will also cause a switch. The value of CS is also related to TPS (transaction per second). Assuming that each call will cause n times of CS, then it can be concluded that

Context Switch Rate = Interrupt Rate + TPS* N

CSR minus IR is the process / thread switch. If the main process receives the request and gives it to the thread processing, the thread processing is returned to the main process. Here is the two switching.

The values of CSR, IR and TPS can also be substituted into the formula to get the number of switching caused by each event. Therefore, in order to reduce CSR, efforts must be made on the switching caused by each TPS. Only if the value of n is lowered, the CSR can be reduced. Ideally, n = 0. However, if n > = 4, check carefully. In addition, the CSR < 5000 said on the Internet, I think the standard should not be so single.

These three indicators can be monitored in LoadRunner; in addition, in Linux, vmstat can also be used to view R (load array), in (interrupt) and CS (context switch)

5. Tools

uptime

dmesg

top
View process activity status and some system conditions

vmstat
View system status, hardware and system information, etc

iostat
Check CPU load and hard disk status

sar
Integrated tools to view system status

mpstat
View multiprocessor status

netstat
View network status

iptraf
Real time network condition monitoring

tcpdump
Capture the network data packet, detailed analysis

mpstat
View multiprocessor status

tcptrace
Packet analysis tool

netperf
Network bandwidth tools

dstat
The tool integrates vmstat, iostat, ifstat, netstat and other information

Reference
*http://tmq.qq.com/2016/07/it-…
https://www.ibm.com/developer…

http://www.oracle.com/technetwork/java/javase/index-137495.html  

http://www.hollischuang.com/a…
Service tuning

Welcome to my official account of WeChat, the latest technology of the brother.2TBTechnical dry goods: includingArchitect practice course, big data, docker container, system operation and maintenance, database, redis, mogodb, e-book, java basic course, Java practical project, elk stack, machine learning, bat interview intensive lecture videoEtc. You just need to「 Migrant workers’ technology road “WeChat official account dialog box replies to key words:1024All information can be obtained.