Windows server side programming – Chapter 2 device IO and inter thread communication – 8 – the architecture of ports around I / O

Time:2020-2-23

Architecture of ports around I / O

When the service application is initialized, I / O completion ports are created through functions such as createnewcompletionport. The application also needs to create a thread pool to handle client requests. Now the question is: how many threads should there be in the thread pool? This question is more difficult to answer, so put the details into the section “how many threads are in the thread pool”. So far, a standard rule is to multiply the number of CPUs by two. Therefore, for a dual CPU machine, you should create a pool of four threads.

All threads in the thread pool should perform the same function. Typically, the thread function initializes and then loops until the service process receives the stop instruction. In the loop, the thread goes to sleep and waits for the device I / O request to complete the port. This can be achieved by calling getqueeudcompletionstatus:

BOOL GetQueuedCompletionStatus(
   HANDLE       hCompPort,
   PDWORD       pdwNumBytes,
   PULONG_PTR   CompKey,
   OVERLAPPED** ppOverlapped,
   DWORD        dwMilliseconds);

The first parameter, hcompport, indicates the completion port that the thread is concerned about. Many service applications use a single I / O completion port to which all I / O requests are notified. Basically, the job of GetQueuedCompletionStatus is to put the thread into sleep until an entry appears on the I / O completion port of the specified completion port, or the specified timeout is reached (specified by dwmilliseconds parameter).

The third data structure associated with the I / O completion port is the thread waiting queue. The ID of each thread in the thread pool that calls GetQueuedCompletionStatus is put in the waiting queue of the thread, so that the I / O completion port kernel object can know which threads are currently waiting to process the completed I / O requests. When a new item appears in the I / O completion queue of the completion port, the completion port selects a thread from the waiting queue of the thread to wake up. The awakened thread will get the following information to organize a completed I / O item: the number of bytes transferred, the completion key value, and the address of the overlapped structure. These information are returned via the pdwnumbytes, pcompkey, and ppoverlapped parameters.

It’s a bit troublesome to check the return reason of getqueuecompletionstatus; the following code demonstrates the correct method:

DWORD dwNumBytes;
ULONG_PTR CompKey;
OVERLAPPED* pOverlapped;

//The IOCP is initialized elsewhere in the program
BOOL fOk = GetQueuedCompletionStatus(hIOCP, &dwNumBytes, &CompKey, &pOverlapped, 1000);
DWORD dwError = GetLastError();

if (fOk) {
   //Successfully processed a completed I / O request
} else {
   if (pOverlapped != NULL) {
      //Failed to process completed I / O request
      //Dwerror contains the reason for the failure
   } else {
      if (dwError == WAIT_TIMEOUT) {
         //Timeout waiting for I / O items to complete
      } else {
         //Wrong call to GetQueuedCompletionStatus
         //Dwerror indicates the reason for the wrong call
      }
   }
}

As expected, items in the I/O completion queue are removed in a first in first out (FIFO) manner. However, unexpectedly, the thread calling GetQueuedCompletionStatus is awakened in a later in, first out (LIFO) manner. The reason for this is to improve performance. For example, there are four threads in the thread waiting queue. When the completed I / O item appears, the last thread that calls GetQueuedCompletionStatus will be woken up to process the item. After the final thread finishes processing, it calls GetQueuedCompletionStatus to enter the thread waiting queue again. Now if there is another I / O Completion item, the same thread will be woken up to handle the new item.

When the completion of I / O requests is slow enough for a single thread to be able to process, the system will wake up the same thread for processing, and the other three threads will continue to sleep. By using the LIFO algorithm, unscheduled threads can swap their memory resources (such as stack space) to disk and empty from the process’s buffer. This means that even if many threads are waiting on the completion port, it is not a bad thing. If there are several threads waiting, but only a few I / O requests are completed, the redundant threads will certainly swap most of their resources out of the system.

Recommended Today

What is Baidu’s secure openrasp project?

What is Baidu‘s secure openrasp project? In the history of this article, the official account is from the public technology. Welcome to redistribute and share knowledge, and respect copyright tagging authors and sources. At the end of the conversation, c0debreak opened a “confidential document” for me to see. It’s a secret that a otaku has […]