An analysis of CPU burst in a web station of an e-commerce trading platform based on. Net

Time:2021-6-8

1: Background

1. Tell a story

I’ve written several real cases about memory inflation in a row. I’m a little numb. I’ll change my taste and share oneCPU burstSome time ago, a friend came to me on Wx and said that one of his old projects was often receivedCPU > 90%It’s embarrassing to have a warning message.

Now that you find me, use WinDbg analysis. What else can you do.

2: WinDbg analysis

1. Exploration site

Since it is said thatCPU > 90%Then I’ll verify whether it is true?

0:359> !tp
CPU utilization: 100%
Worker Thread: Total: 514 Running: 514 Idle: 0 MaxLimit: 2400 MinLimit: 32
Work Request in Queue: 1
    Unknown Function: 00007ff874d623fc  Context: 0000003261e06e40
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 2 Free: 2 MaxFree: 48 CurrentLimit: 2 MaxLimit: 2400 MinLimit: 32

From the perspective of hexagrams, it’s really spectacular. The CPU is directly full, and 514 threads in the thread pool are running at full load. What are they running for? First of all, I have to doubt whether these threads are locked by something.

2. View synchronization block table

After all, we all like to use lock to play multithreading synchronization. You can use lock to synchronize!syncblkCommand view.

0:359> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
   53 000000324cafdf68          498         0 0000000000000000     none    0000002e1a2949b0 System.Object
-----------------------------
Total           1025
CCW             3
RCW             4
ComClassFactory 0
Free            620

I’ll go. This hexagram looks very strange,MonitorHeld=498What the hell is that??? Textbooks say:owner + 1 , waiter + 2So what you see with the naked eye is always an odd number. What does even number mean? After checking the magic stack overflow, it can be summarized as follows:

This situation is more difficult than winning the lottery, and I firmly believe that I will not take this kind of luck…

  • Lock escort

Some time ago, I shared a real case:An analysis of CPU burst height of a travel agency web station in. NetIt’s because of the CPU explosion caused by lock convoy. Sure enough, the world is really small, and it’s met again… In order to facilitate your understanding, I’d better stick that picture on it.

After reading this picture, you should understand that a thread frequently scrambles for locks in the time slice, so it is easy to see that a thread holding a lock just exits, and those threads waiting for a lock do not have a real lock at this time. The just caught dump is such a time difference. In other words, the current 498 is all the count of waiter threads, That is, 249 waiter threads. Next, you can verify them. Call out the thread stack of all threads, and then retrieve themMonitor.Enterkey word.

It can be seen from the figure that 220 threads are currently stuckMonitor.EnterIt seems that 29 threads have been lost. Anyway, a large number of threads are stuck. From the stack, it seems that they are stuckxxx.Global.PreProcessMethod setting context stuck, in order to satisfy my curiosity, I will export the problem code.

3. Check the problem code

Or the old order!ip2md + !savemodule

0:359> !ip2md 00007ff81ae98854
MethodDesc:   00007ff819649fa0
Method Name:  xxx.Global.PreProcess(xxx.JsonRequest, System.Object)
Class:        00007ff81966bdf8
MethodTable:  00007ff81964a078
mdToken:      0000000006000051
Module:       00007ff819649768
IsJitted:     yes
CodeAddr:     00007ff81ae98430
Transparency: Critical
0:359> !savemodule 00007ff819649768 E:\dumps\PreProcess.dll
3 sections in file
section 0 - VA=2000, VASize=b6dc, FileAddr=200, FileSize=b800
section 1 - VA=e000, VASize=3d0, FileAddr=ba00, FileSize=400
section 2 - VA=10000, VASize=c, FileAddr=be00, FileSize=200

Then open the problem code with ilspy, and the screenshot is as follows:

Nima, sure enoughDataContext.SetContextItem()Methods have a lock lock, perfect hitlock convoy

Is this really the end?

Originally, I was ready to report, but I thought that more than 500 thread stacks had been transferred out, and I was also idle when I was idle. I might as well scan it. As a result, I found that 134 threads were stuckReaderWriterLockSlim.TryEnterReadLockCoreAs shown in the figure below:

It can be seen from the name that this is an optimized version of the read-write lockReaderWriterLockSlimWhy are 138 threads stuck here? I’m really curious. Let’s export the question again.

internal class LocalMemoryCache : ICache
{
    private string CACHE_LOCKER_PREFIX = "xx_xx_";

    private static readonly NamedReaderWriterLocker _namedRwlocker = new NamedReaderWriterLocker();

    public T GetWithCache(string cacheKey, Func getter, int cacheTimeSecond, bool absoluteExpiration = true) where T : class
    {
        T val = null;
        ReaderWriterLockSlim @lock = _namedRwlocker.GetLock(cacheKey);
        try
        {
            @lock.EnterReadLock();
            val = (MemoryCache.Default.Get(cacheKey) as T);
            if (val != null)
            {
                return val;
            }
        }
        finally
        {
            @lock.ExitReadLock();
        }
        try
        {
            @lock.EnterWriteLock();
            val = (MemoryCache.Default.Get(cacheKey) as T);
            if (val != null)
            {
                return val;
            }
            val = getter();
            CacheItemPolicy cacheItemPolicy = new CacheItemPolicy();
            if (absoluteExpiration)
            {
                cacheItemPolicy.AbsoluteExpiration = new DateTimeOffset(DateTime.Now.AddSeconds(cacheTimeSecond));
            }
            else
            {
                cacheItemPolicy.SlidingExpiration = TimeSpan.FromSeconds(cacheTimeSecond);
            }
            if (val != null)
            {
                MemoryCache.Default.Set(cacheKey, val, cacheItemPolicy);
            }
            return val;
        }
        finally
        {
            @lock.ExitWriteLock();
        }
    }

Looking at the above code, I want to implement a getoradd operation on memorycache. It seems that for the sake of security, every cache key is equipped with a readerwriterlockslim. This logic is a bit strange. After all, memorycache itself has thread safety methods to implement this logic, such as:

public class MemoryCache : ObjectCache, IEnumerable, IDisposable
{
    public override object AddOrGetExisting(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
    {
        if (regionName != null)
        {
            throw new NotSupportedException(R.RegionName_not_supported);
        }
        CacheItemPolicy cacheItemPolicy = new CacheItemPolicy();
        cacheItemPolicy.AbsoluteExpiration = absoluteExpiration;
        return AddOrGetExistingInternal(key, value, cacheItemPolicy);
    }
}

5. What’s wrong with using reader writer lockslim?

Ha ha, there must be a lot of friends asking?, Indeed, what’s the problem? Let’s look at it first_ How many readerwriterlockslim are there in the namedrwlocker collection? Want to verify is very simple, on the managed heap search can.

0:359> !dumpheap -type System.Threading.ReaderWriterLockSlim -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ff8741631e8    70234      6742464 System.Threading.ReaderWriterLockSlim

We can see that there are 7W + readerwriterlockslim in the current managed heap. What can we do??? Don’t forget, the reason why the reader writer lockslim has oneSlimBecause it can implement user modespinThatspinYou have to eat a little CPU. If you enlarge it several hundred times? Can CPU not be lifted?

3: Summary

All in all, what this dump reflectsCPU fullThere are two reasons.

  • Frequent contention and context switching caused by lock convoy give CPU a critical blow.
  • A hundredfold of readerwriterlockslimUser state spinAnother blow to the CPU.

After knowing the reason, the solution is simple.

  • Batch operation, reduce the number of serial locks, do not play lock volume.
  • Remove the readerwriterlockslim and use the thread safe method of memorycache.

More high quality dry goods: see my GitHub:dotnetfly

图片名称

Recommended Today

three. Use of JS

Take creating a cube as an example install Install three: NPM I three use quote Three and the camera control orbitcontrols included in three are introduced to control the camera: Initialize scene Scene: all three things in the scene.Camera: when using the perspective camera, pay attention to adjusting the parameters, otherwise you will not see […]