An API hang analysis of a cloud procurement platform based on. Net

Time:2021-10-22

1: Background

1. Tell a story

I haven’t written a blog for about two months. My friends who pay attention to me should know that I have recently spent my energy on the planet. In the past two months, there have been friends asking for help on how to analyze dump. Some friends are too polite and gave a big red envelope. Ha ha, I have saved more than 10 dump of different problem types in my hand. I will contribute my analysis ideas one by one in the future.

This dump was provided to me by a friend about a month ago. Because there were many friends in Wx who asked for help, I couldn’t find the relevant screenshot at the moment, so I had to break the old rules.

Since my friend said that the API interface is unresponsive and presents the hangon phenomenon, from some past experience, there are probably only three cases.

  • Massive lock waiting

  • Not enough threads

  • deadlock

With this preconceived idea, let’s talk about WinDbg.

2: WinDbg analysis

1. Are there a lot of locks waiting?

If you want to see whether to lock and wait, the old rule, take a lookSynchronization block table

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
-----------------------------
Total           1673
CCW             3
RCW             4
ComClassFactory 0
Free            397

If you have nothing, just look at all the thread stacks.

It’s OK. I was shocked. 339 threads were stuckSystem.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)But on second thought, even if 339 threads are stuck here, will it really lead to program hangon? Not necessarily. After all, I’ve seen that 1000 + threads won’t get stuck. It’s just that the CPU has exploded. Next, continue to study and judge whether it is caused by insufficient threads. You can start fromThread pool task queueStart from above.

2. Explore thread pool queue

Can use!tpCommand view.

0:000> !tp
CPU utilization: 10%
Worker Thread: Total: 328 Running: 328 Idle: 0 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 74
    Unknown Function: 00007ffe91cc17d0  Context: 000001938b5d8d98
    Unknown Function: 00007ffe91cc17d0  Context: 000001938b540238
    Unknown Function: 00007ffe91cc17d0  Context: 000001938b5eec08
    ...
    Unknown Function: 00007ffe91cc17d0  Context: 0000019390552948
    Unknown Function: 00007ffe91cc17d0  Context: 0000019390562398
    Unknown Function: 00007ffe91cc17d0  Context: 0000019390555b30
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread:Total: 5 Free: 4 MaxFree: 8 CurrentLimit: 4 MaxLimit: 1000 MinLimit: 4

From the output information, 328 threads in the thread pool are all full, and 74 guests are waiting in the work queue. Based on these two information, it is clear that this hangon is caused by a large number of guests exceeding the reception capacity of the thread pool.

3. Is the reception really not good?

I think this title is very good, really not? In the end, we can start from two points:

  • Is the code bad?

  • Does QPS really exceed the reception capacity?

To find out, we have to start with the 339 blocked threads, and carefully study the call stack of each thread. It is probably stuck in these three places.

<1>. GetModel

public static T GetModel(string url, K content)
{
	T result = default(T);
	HttpClientHandler httpClientHandler = new HttpClientHandler();
	httpClientHandler.AutomaticDecompression = DecompressionMethods.GZip;
	HttpClientHandler handler = httpClientHandler;
	using (HttpClient httpClient = new HttpClient(handler))
	{
		string content2 = JsonConvert.SerializeObject((object)content);
		HttpContent httpContent = new StringContent(content2);
		httpContent.Headers.ContentType = new MediaTypeHeaderValue("application/json");
		string mD5ByCrypt = Md5.GetMD5ByCrypt(ConfigurationManager.AppSettings["SsoToken"] + DateTime.Now.ToString("yyyyMMdd"));
		httpClient.DefaultRequestHeaders.Add("token", mD5ByCrypt);
		httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
		HttpResponseMessage result2 = httpClient.PostAsync(url, httpContent).Result;
		if (result2.IsSuccessStatusCode)
		{
			string result3 = result2.Content.ReadAsStringAsync().Result;
			return JsonConvert.DeserializeObject(result3);
		}
		return result;
	}
}

<2>. Get

public static T Get(string url, string serviceModuleName)
{
	try
	{
		T val3 = default(T);
		HttpClient httpClient = TryGetClient(serviceModuleName, true);
		using (HttpResponseMessage httpResponseMessage = httpClient.GetAsync(GetRelativeRquestUrl(url, serviceModuleName, true)).Result)
		{
			if (httpResponseMessage.IsSuccessStatusCode)
			{
				string result = httpResponseMessage.Content.ReadAsStringAsync().Result;
				if (!string.IsNullOrEmpty(result))
				{
					val3 = JsonConvert.DeserializeObject(result);
				}
			}
		}
		T val4 = val3;
		val5 = val4;
		return val5;
	}
	catch (Exception exception)
	{
		throw;
	}
}

<3>. GetStreamByApi

public static Stream GetStreamByApi(string url, T content)
{
	Stream result = null;
	HttpClientHandler httpClientHandler = new HttpClientHandler();
	httpClientHandler.AutomaticDecompression = DecompressionMethods.GZip;
	HttpClientHandler handler = httpClientHandler;
	using (HttpClient httpClient = new HttpClient(handler))
	{
		httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/octet-stream"));
		string content2 = JsonConvert.SerializeObject((object)content);
		HttpContent httpContent = new StringContent(content2);
		httpContent.Headers.ContentType = new MediaTypeHeaderValue("application/json");
		HttpResponseMessage result2 = httpClient.PostAsync(url, httpContent).Result;
		if (result2.IsSuccessStatusCode)
		{
			result = result2.Content.ReadAsStreamAsync().Result;
		}
		httpContent.Dispose();
		return result;
	}
}

4. Find the truth

The code of the three methods I listed above, I don’t know what problems you can see? Yes, it isAsynchronous method synchronization, this way of writing itself is very inefficient, mainly in two aspects.

  • Opening and closing thread itself is a relatively resource consuming and inefficient operation.

  • Frequent thread scheduling puts great pressure on CPU

Moreover, there is no problem with this writing when the request volume is relatively small. Once the request volume is slightly larger, the dump will be encountered immediately.

3: Summary

Overall, the hangon accident was caused by developersAsynchronous methods do not asynchronizeAs a result, the modification method is very simple, and the pure asynchronous transformation (await, async) is carried out to liberate the calling thread and make full use of the ability of the driving device.

This dump also reminds me ofCLR Via C#The book (p646647) is talking about the example of using await and async to transform synchronization requests.

I think this dump is the best evidence of this example!

More high quality dry goods: see my GitHub:dotnetfly

图片名称