Analysis of a CPU burst in his system of a hospital based on. Net

Time:2021-6-22

1: Background

1. Tell a story

A few days ago, a friend, Jia Wx, complained that his program always had an inexplicable CPU burst at the peak. How to analyze it?



After communicating with this friend, it is said that this problem has bothered them for several years, and Microsoft engineers have been invited to solve it. In vain, I still haven’t found the right Microsoft boss…

As for the problem of program CPU explosion, old readers should know that I have written several articles, which basically fall into two categories

  • GC trigger

  • Lots of lock locks

A few of them are all kinds of inexplicable problems, which can’t be listed one by one. Now that my friend has found me, I have to find a way to solve them. If I don’t talk much, I’ll go to WinDbg.

2: WinDbg analysis

1. View synchronization block table

In case of this kind of problem, first checkSynchronous block tableIt’s my habitual thinking. The order is very simple!syncblk

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
-----------------------------
Total           20779
CCW             16
RCW             21
ComClassFactory 0
Free            16490

I went, and there was nothing in the synchronization block table… Since it has nothing to do with locks, let’s look at threads. After all, threads are supported by CPU.

2. View threads

To view themanaged thread , which can be used!tCommand, more threads, a little simplified.

0:000> !t
ThreadCount:      135
UnstartedThread:  0
BackgroundThread: 132
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
  34    1 25d4 000001ea28702130    28220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  74    2 3ed0 000001ea286fa940    2b220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     MTA (Finalizer) 
  76    3 4a70 000001f4447d7810  102a220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     MTA (Threadpool Worker) 
  77    4 326c 000001f4447dbe60    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  78    6 2dac 000001f4447d9750  1020220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn (Threadpool Worker) 
  79    7 1468 000001f444a2d6f0    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 1     Ukn (GC) 
  80    8   f0 000001f444a2cf20    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  81    9 3118 000001f444a2f630    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn

Sell a pass first, can a friend see these threads have what strange??? Yes, threads79There is one in the last column ofUkn (GC)Mark, I think you must be curious. What does that mean? There are some changes in the underlying GC mode, but anyway, it tells you to a certain extent that your program triggered GC. For further verification, you can use the!t -specialTake a look at the category of thread 79 and more detailed information.

0:000> !t -special
ThreadCount:      135
UnstartedThread:  0
BackgroundThread: 132
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
  34    1 25d4 000001ea28702130    28220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  79    7 1468 000001f444a2d6f0    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 1     Ukn (GC) 

OSID Special thread type
41 38d8 DbgHelper 
42 1c88 GC 
74 3ed0 Finalizer 
75 402c ProfilingAPIAttach 
76 4a70 Timer
79 1468 GC SuspendEE

As you can see from the last line of output, 79 is the GC thread, followed by a strange oneSuspendEEMark, you’re curious again. What does that mean?

SuspendEE = Suspend CLR Execution Engine (EE)

In other words, thread 79CLR execution engineIt’s frozen. The purpose is very simple, just for the convenience of other people31 GC threadsTake care of the current situationManaged heapBut this old brother machine really, 32core, do not know which hospital is so awesome, add it.!cpuidverification.

0:000> !cpuid
CP  F/M/S  Manufacturer     MHz
 0  6,62,4     2600
 1  6,62,4     2600
 2  6,62,4     2600
 3  6,62,4     2600
 xxx
31  6,62,4     2600

Now that the GC trigger is predicted, the next step is to call out the managed and unmanaged stacks of all threads.

3. View each thread stack

To view the managed and unmanaged stacks of individual threads, use!eestackThen you can search for keywordsWaitUntilGCCompleteTo determine how many threads are waiting for GC processing to complete.

As can be seen from the figure, currently 40 threads are blocked. It’s good. The problem is becoming clearer and clearer. Next, analyze which thread did something that shouldn’t be done, causing GC to trigger. You can also searchtry_allocate_more_spaceTo determine which threads are allocating space.


I’ll go, and you can see clearly the current situation170and187Thread number is allocating a large objectgc_heap::allocate_large_objectWhen GC is triggered, the big object heap itself is a daunting thing, and its recycling and cleaning are very CPU intensive, which I also talked about with friendsThe CPU drops in about a minuteThe situation is quite consistent.

4. Looking for the culprit

Now the focus is on these two threads. I’ve seen that these two thread stacks are the same method, so I’ll choose one187Thread to analyze it, you can use!clrstackTake a look at its hosting stack.

0:187> !clrstack 
OS Thread Id: 0x1ef0 (187)
        Child SP               IP Call Site
00000054ce631e30 00007ffc4021bde2 System.String.FillStringChecked(System.String, Int32, System.String)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
00000054ce631ed0 00007ffc40223528 System.String.Concat(System.String[])
00000054ce631f40 00007ffbe6dbdafb BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String, System.String ByRef)
00000054ce65cf40 00007ffbe6ab3187 BLL.xxx.xxx.ExpZB(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String)

On the stack, it looks like it was usedSystem.String.ConcatString splicing is caused by string splicing. Good guy, I don’t know how many times string splicing has been attacked by people over the years. There are still a lot of people stepping on the pit. For further verification, it can be used!clrstack -p + !da -details xxxTake a look at what this system. String [] is, and simplify it as follows:

0:187> !clrstack -p
OS Thread Id: 0x1ef0 (187)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
    PARAMETERS:
        values () = 0x000001ea69e8d2f8
        totalLength = 
0:187> !da -details 0x000001ea69e8d2f8
Name:        System.String[]
Size:        128(0x80) bytes
Array:       Rank 1, Number of elements 13, Type CLASS
Element Methodtable: 00007ffc403d6948
[0] 000001f2391a83f0
    Name:        System.String
    MethodTable: 00007ffc403d6948
    EEClass:     00007ffc3fcd50e0
    Size:        445950(0x6cdfe) bytes
    File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
    String:      xxxxx

As you can see from the output information,String[]There are 13 elements in it, the biggest of which is string445950 bytes = 445k, which is larger than the 85K definition of large objects, so concat here is the crux of the problem, and so is 170 threads. Next, I have to solve a problem: why is there such a large string generated, and what exactly is done in the code??? If you want to find out, you have to export the source code from dump.

5. Check the problem code

To analyze the problem code, you can!ip2md + !savemoduleExport the bll.xxx.xxx.getrowstring method.

0:187> !ip2md 00007ffbe6dbdafb
MethodDesc:   00007ffbe5342118
Method Name:  BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String, System.String ByRef)
Class:        00007ffbe52fe328
MethodTable:  00007ffbe53421d8
mdToken:      0000000006000096
Module:       00007ffbe471a890
0:187> !savemodule  00007ffbe471a890 E:\dumps\RowString.dll
3 sections in file
section 0 - VA=2000, VASize=f7fcc, FileAddr=200, FileSize=f8000
section 1 - VA=fa000, VASize=3bc, FileAddr=f8200, FileSize=400
section 2 - VA=fc000, VASize=c, FileAddr=f8600, FileSize=200

And then sacrificeILSpyDecompile the code.

Good guy, this writing method is true, countless string splicing, I feel itgenandLOHIt’s too late to allocate memory segments. It’s really killing GC…

3: Summary

In fact, this is a textbook like problem, and there are also textbook like solutions. Moreover, I saw that this method has more than 600 lines of code, basically doing string splicing. Finally, let’s talk about the solutions.

  • Reconstruct the method, try to use StringBuilder instead of string to minimize the number of GC triggers.

It seems that the result of this analysis coincides with my friend’s deep suspicion…

——————— Egg update ———————

More high quality dry goods: see my GitHub:dotnetfly

图片名称