. net}6 thread pool implementation overview

Time:2022-5-22

preface

In the upcoming Net 6 runtime, the default implementation of thread pool is changed from C + + code to C #, which is more convenient for us to learn the design of thread pool.
https://github.com/dotnet/runtime/tree/release/6.0/src/libraries/System.Threading.ThreadPool

The new thread pool implementation is located inPortableThreadPoolMedium, originalThreadPoolThe exposed interface in will be called directlyPortableThreadPoolImplementation in.

By setting environment variablesThreadPool_UsePortableThreadPool0 can be set to use the old thread pool implementation.
https://github.com/dotnet/runtime/pull/43841/commits/b0d47b84a6845a70f011d1b0d3ce5adde9a4d7b7

This paper is based on Net 6 runtime source code is used as learning material to introduce the design of thread pool. From the current understanding, the overall design is not very different from the original C + + implementation.

be careful:

  • This article does not involve the detailed code implementation, but mainly introduces its overall design. The code shown is not the original source code, but a simplified version for easy understanding.
  • ThreadPool.SetMaxThreads(int workerThreads, int completionPortThreads) MediumcompletionPortThreads RelatedIOCP thread poolYes Net framework, which is used to manage the callback thread pool of IOCP proprietary to Windows platform. At present, I don’t see any place using it. The parameter completionportthreads has no meaning. The underlying IO library maintains its own IO waiting thread pool. This article only covers the introduction of worker thread pool.
  • The understanding of this article is not complete and may not be completely correct. If you have any objection, please leave a message for discussion.
  • To explain the problem, part of the code will run in Net 6.

Task scheduling

The pending tasks of the thread pool are stored in a queue system. This system includes a global queue and a local queue bound to each worker thread. Each thread in the thread pool is executingwhile(true)To retrieve and execute tasks from this queue system.

stayThreadPool.QueueUserWorkItemOverload method ofThreadPool.QueueUserWorkItem<TState>(Action<TState> callBack, TState state, bool preferLocal)There’s one in thepreferLocalParameters.

Call withoutpreferLocalParametricThreadPool.QueueUserWorkItemMethod overload, the task will be put into the global queue.

WhenpreferLocalWhen true, if calledThreadPool.QueueUserWorkItemIf the thread of the code happens to be a thread in a thread pool, the task will enter the local queue of the thread. Otherwise, it will be put into the global queue and wait to be picked up by a worker thread in the future.

Called in a thread outside the thread pool, regardless ofpreferLocalWhatever is transmitted, the task will be put into the global queue.

Basic dispatching unit

The element types of local queue and global queue are defined as object. The actual task types are divided into two types. After getting the task from the queue system, the type will be determined and the corresponding method will be executed.

An instance of the ithreadpoolworkitem implementation class.


/// <summary>Represents a work item that can be executed by the ThreadPool.</summary>
public interface IThreadPoolWorkItem
{
    void Execute();
}

Executing the execute method also represents the execution of the task.

IThreadPoolWorkItemThere are many specific implementations of, for example, throughThreadPool.QueueUserWorkItem(WaitCallback callBack)The incoming callback delegate instance is wrapped in aQueueUserWorkItemCallbackIn the example.QueueUserWorkItemCallbackyesIThreadPoolWorkItemImplementation class of.

Task


class Task
{
    internal void InnerInvoke();
}

Executing innerinvoke will execute the delegation contained in the task.

Global queue

Global queues are created byThreadPoolWorkQueueAt the same time, it is also the entrance of the whole queue system and is directly referenced by ThreadPool.

public static class ThreadPool
{
    internal static readonly ThreadPoolWorkQueue s_workQueue = new ThreadPoolWorkQueue();

    public static bool QueueUserWorkItem(WaitCallback callBack, object state)
    {
        object tpcallBack = new QueueUserWorkItemCallback(callBack!, state);

        s_workQueue.Enqueue(tpcallBack, forceGlobal: true);

        return true;
    }
}

internal sealed class ThreadPoolWorkQueue
{
    //Global queue
    internal readonly ConcurrentQueue<object> workItems = new ConcurrentQueue<object>();

    //When forceglobal is true, push to the global queue, otherwise put to the local queue
    public void Enqueue(object callback, bool forceGlobal);
}

Local queue

Each thread in the thread pool will be bound with oneThreadPoolWorkQueueThreadLocalsInstance, the local queue is saved in the worksteelingqueue field.

internal sealed class ThreadPoolWorkQueueThreadLocals
{
    //Bind to thread in thread pool
    [ThreadStatic]
    public static ThreadPoolWorkQueueThreadLocals threadLocals;

    //Hold a reference to the global queue so that tasks can be transferred to the global queue when needed
    public readonly ThreadPoolWorkQueue workQueue;
    //Direct maintainer of local queue
    public readonly ThreadPoolWorkQueue.WorkStealingQueue workStealingQueue;
    public readonly Thread currentThread;

    public ThreadPoolWorkQueueThreadLocals(ThreadPoolWorkQueue tpq)
    {
        workQueue = tpq;
        workStealingQueue = new ThreadPoolWorkQueue.WorkStealingQueue();
        //Worksealingqueue list will centrally manage worksealingqueue
        ThreadPoolWorkQueue.WorkStealingQueueList.Add(workStealingQueue);
        currentThread = Thread.CurrentThread;
    }

    //It provides the function of transferring tasks in the local queue to the global queue,
    //When ThreadPool judges that the current thread is redundant through the hillclimbing algorithm introduced later,
    //This method is called to transfer the task
    public void TransferLocalWork()
    {
        while (workStealingQueue.LocalPop() is object cb)
        {
            workQueue.Enqueue(cb, forceGlobal: true);
        }
    }

    ~ThreadPoolWorkQueueThreadLocals()
    {
        if (null != workStealingQueue)
        {
            //The real purpose of transferlocalwork is not to be called here, but just the fallback logic to ensure that the task will not be lost
            TransferLocalWork();
            ThreadPoolWorkQueue.WorkStealingQueueList.Remove(workStealingQueue);
        }
    }
}

Theft mechanism

Here’s a question: why is the name of the local queue calledWorkStealingQueueAnd?

AllWorker ThreadofWorkStealingQueueAre focused onWorkStealingQueueListYes. Visible to all other threads in the thread pool.

Worker Threadofwhile(true)The priority will be from their ownWorkStealingQueueTake the task from the list. If the local queue has been emptied, tasks will be fetched from the global queue. For example, thread1 in the following figure gets a task from the global queue.

At the same time, thread3 has no work to do, but the tasks in the global queue are robbed by thread1. At this time, you will rob thread2 from the local queue of thread2.

Lifecycle management of worker thread

Next, we will enlarge the pattern and shift our focus from the daily work of worker thread to their life cycle management.

In order to explain the mechanism of thread management more conveniently, the following code is used for demonstration.
Code reference fromhttps://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/。

Thread injection experiment

Task.RunThe task will be dispatched to the thread pool for execution. In the following example code, it is equivalent toThreadPool.QueueUserWorkItem(WaitCallback callBack), the task is placed in the global queue of the queue system (by the way, if it is executed in a thread pool thread)Task.RunThe task is scheduled to this thread pool (in the local queue of threads).

. net 5 Experiment 1 default thread pool configuration


static void Main(string[] args)
{
    var sw = Stopwatch.StartNew();
    var tcs = new TaskCompletionSource();
    var tasks = new List<Task>();
    for (int i = 1; i <= Environment.ProcessorCount * 2; i++)
    {
        int id = i;
        Console.WriteLine($"Loop Id: {id:00}    | {sw.Elapsed.TotalSeconds:0.000} | Busy Threads: {GetBusyThreads()}");
        tasks.Add(Task.Run(() =>
        {
            Console.WriteLine($"Task Id: {id:00}    | {sw.Elapsed.TotalSeconds:0.000} | Busy Threads: {GetBusyThreads()}");
            tcs.Task.Wait();
        }));
    }

    tasks.Add(Task.Run(() =>
    {
        Console.WriteLine($"Task SetResult | {sw.Elapsed.TotalSeconds:0.000} | Busy Threads: {GetBusyThreads()}");
        tcs.SetResult();
    }));
    Task.WaitAll(tasks.ToArray());
    Console.WriteLine($"Done:          | {sw.Elapsed.TotalSeconds:0.000}");
}

static int GetBusyThreads()
{
    ThreadPool.GetAvailableThreads(out var available, out _);
    ThreadPool.GetMaxThreads(out var max, out _);
    return max - available;
}

First, in the code The number of CPUs in the net environment is less than 5.


Loop Id: 01    | 0.000 | Busy Threads: 0
Loop Id: 02    | 0.112 | Busy Threads: 1
Loop Id: 03    | 0.112 | Busy Threads: 2
Loop Id: 04    | 0.113 | Busy Threads: 4
Loop Id: 05    | 0.113 | Busy Threads: 7
Loop Id: 06    | 0.113 | Busy Threads: 10
Loop Id: 07    | 0.113 | Busy Threads: 10
Task Id: 01    | 0.113 | Busy Threads: 11
Task Id: 02    | 0.113 | Busy Threads: 12
Task Id: 03    | 0.113 | Busy Threads: 12
Task Id: 07    | 0.113 | Busy Threads: 12
Task Id: 04    | 0.113 | Busy Threads: 12
Task Id: 05    | 0.113 | Busy Threads: 12
Loop Id: 08    | 0.113 | Busy Threads: 10
Task Id: 08    | 0.113 | Busy Threads: 12
Loop Id: 09    | 0.113 | Busy Threads: 11
Loop Id: 10    | 0.113 | Busy Threads: 12
Loop Id: 11    | 0.114 | Busy Threads: 12
Loop Id: 12    | 0.114 | Busy Threads: 12
Loop Id: 13    | 0.114 | Busy Threads: 12
Loop Id: 14    | 0.114 | Busy Threads: 12
Loop Id: 15    | 0.114 | Busy Threads: 12
Loop Id: 16    | 0.114 | Busy Threads: 12
Loop Id: 17    | 0.114 | Busy Threads: 12
Loop Id: 18    | 0.114 | Busy Threads: 12
Loop Id: 19    | 0.114 | Busy Threads: 12
Loop Id: 20    | 0.114 | Busy Threads: 12
Loop Id: 21    | 0.114 | Busy Threads: 12
Loop Id: 22    | 0.114 | Busy Threads: 12
Loop Id: 23    | 0.114 | Busy Threads: 12
Loop Id: 24    | 0.114 | Busy Threads: 12
Task Id: 09    | 0.114 | Busy Threads: 12
Task Id: 06    | 0.114 | Busy Threads: 12
Task Id: 10    | 0.114 | Busy Threads: 12
Task Id: 11    | 0.114 | Busy Threads: 12
Task Id: 12    | 0.114 | Busy Threads: 12
Task Id: 13    | 1.091 | Busy Threads: 13
Task Id: 14    | 1.594 | Busy Threads: 14
Task Id: 15    | 2.099 | Busy Threads: 15
Task Id: 16    | 3.102 | Busy Threads: 16
Task Id: 17    | 3.603 | Busy Threads: 17
Task Id: 18    | 4.107 | Busy Threads: 18
Task Id: 19    | 4.611 | Busy Threads: 19
Task Id: 20    | 5.113 | Busy Threads: 20
Task Id: 21    | 5.617 | Busy Threads: 21
Task Id: 22    | 6.122 | Busy Threads: 22
Task Id: 23    | 7.128 | Busy Threads: 23
Task Id: 24    | 7.632 | Busy Threads: 24
Task SetResult | 8.135 | Busy Threads: 25
Done:          | 8.136

Task. Run will schedule tasks to the thread pool for execution, and the first 24 tasks will be blocked until the 25th task is executed. Each time, the number of threads executing tasks in the frontline process pool (that is, the number of threads created and completed) will be printed.

The following results can be observed:

  • In the first few cycles, the number of threads increases with the number of tasks. In the next few cycles, the number of threads remains unchanged at 12 until the end of the cycle.
  • The zero interval increases before the number of threads reaches 12. The interval between the 12th and 13th threads is less than 1s, and one thread will be added about 500ms later.

. net 5 Experiment 2 adjust ThreadPool settings

Add the following two lines of code at the beginning of the above code and continue in Net 5 environment runs once.


ThreadPool.GetMinThreads(out int defaultMinThreads, out int completionPortThreads);
Console.WriteLine($"DefaultMinThreads: {defaultMinThreads}");
ThreadPool.SetMinThreads(14, completionPortThreads);

The operation results are as follows


DefaultMinThreads: 12
Loop Id: 01    | 0.000 | Busy Threads: 0
Loop Id: 02    | 0.003 | Busy Threads: 1
Loop Id: 03    | 0.003 | Busy Threads: 2
Loop Id: 04    | 0.003 | Busy Threads: 5
Loop Id: 05    | 0.004 | Busy Threads: 8
Task Id: 01    | 0.004 | Busy Threads: 10
Task Id: 03    | 0.004 | Busy Threads: 10
Loop Id: 06    | 0.004 | Busy Threads: 10
Task Id: 02    | 0.004 | Busy Threads: 10
Task Id: 04    | 0.004 | Busy Threads: 10
Task Id: 05    | 0.004 | Busy Threads: 12
Loop Id: 07    | 0.004 | Busy Threads: 9
Loop Id: 08    | 0.004 | Busy Threads: 10
Loop Id: 09    | 0.004 | Busy Threads: 11
Loop Id: 10    | 0.004 | Busy Threads: 12
Task Id: 08    | 0.004 | Busy Threads: 14
Task Id: 06    | 0.004 | Busy Threads: 14
Task Id: 09    | 0.004 | Busy Threads: 14
Task Id: 10    | 0.004 | Busy Threads: 14
Loop Id: 11    | 0.004 | Busy Threads: 14
Loop Id: 12    | 0.004 | Busy Threads: 14
Loop Id: 13    | 0.004 | Busy Threads: 14
Loop Id: 14    | 0.004 | Busy Threads: 14
Loop Id: 15    | 0.004 | Busy Threads: 14
Loop Id: 16    | 0.004 | Busy Threads: 14
Loop Id: 17    | 0.004 | Busy Threads: 14
Loop Id: 18    | 0.004 | Busy Threads: 14
Loop Id: 19    | 0.004 | Busy Threads: 14
Loop Id: 20    | 0.004 | Busy Threads: 14
Loop Id: 21    | 0.004 | Busy Threads: 14
Loop Id: 22    | 0.004 | Busy Threads: 14
Task Id: 11    | 0.004 | Busy Threads: 14
Loop Id: 23    | 0.004 | Busy Threads: 14
Loop Id: 24    | 0.005 | Busy Threads: 14
Task Id: 07    | 0.005 | Busy Threads: 14
Task Id: 12    | 0.005 | Busy Threads: 14
Task Id: 13    | 0.005 | Busy Threads: 14
Task Id: 14    | 0.005 | Busy Threads: 14
Task Id: 15    | 0.982 | Busy Threads: 15
Task Id: 16    | 1.486 | Busy Threads: 16
Task Id: 17    | 1.991 | Busy Threads: 17
Task Id: 18    | 2.997 | Busy Threads: 18
Task Id: 19    | 3.501 | Busy Threads: 19
Task Id: 20    | 4.004 | Busy Threads: 20
Task Id: 21    | 4.509 | Busy Threads: 21
Task Id: 22    | 5.014 | Busy Threads: 22
Task Id: 23    | 5.517 | Busy Threads: 23
Task Id: 24    | 6.021 | Busy Threads: 24
Task SetResult | 6.522 | Busy Threads: 25
Done:          | 6.523

After adjusting the minimum number of threads in the thread pool, the turning point of thread injection speed changed from the 12th (default min threads) thread to the 14th (modified min threads).

The overall time is also reduced from 8s to 6S.

. net 5 Experiment 3 TCS Task. Change wait() to thread Sleep


static void Main(string[] args)
{
    var sw = Stopwatch.StartNew();
    var tasks = new List<Task>();
    for (int i = 1; i <= Environment.ProcessorCount * 2; i++)
    {
        int id = i;
        Console.WriteLine(
            $"Loop Id: {id:00}    | {sw.Elapsed.TotalSeconds:0.000} | Busy Threads: {GetBusyThreads()}");
        tasks.Add(Task.Run(() =>
        {
            Console.WriteLine(
                $"Task Id: {id:00}    | {sw.Elapsed.TotalSeconds:0.000} | Busy Threads: {GetBusyThreads()}");
            Thread.Sleep(Environment.ProcessorCount * 1000);
        }));
    }

    Task.WhenAll(tasks.ToArray()).ContinueWith(_ =>
    {
        Console.WriteLine($"Done:          | {sw.Elapsed.TotalSeconds:0.000}");
    });
    Console.ReadLine();
}

Loop Id: 01    | 0.000 | Busy Threads: 0
Loop Id: 02    | 0.027 | Busy Threads: 1
Loop Id: 03    | 0.027 | Busy Threads: 2
Loop Id: 04    | 0.027 | Busy Threads: 3
Loop Id: 05    | 0.028 | Busy Threads: 4
Loop Id: 06    | 0.028 | Busy Threads: 10
Loop Id: 07    | 0.028 | Busy Threads: 9
Loop Id: 08    | 0.028 | Busy Threads: 9
Loop Id: 09    | 0.028 | Busy Threads: 10
Loop Id: 10    | 0.028 | Busy Threads: 12
Loop Id: 11    | 0.028 | Busy Threads: 12
Loop Id: 12    | 0.028 | Busy Threads: 12
Loop Id: 13    | 0.028 | Busy Threads: 12
Loop Id: 14    | 0.028 | Busy Threads: 12
Loop Id: 15    | 0.028 | Busy Threads: 12
Loop Id: 16    | 0.028 | Busy Threads: 12
Loop Id: 17    | 0.028 | Busy Threads: 12
Loop Id: 18    | 0.028 | Busy Threads: 12
Loop Id: 19    | 0.028 | Busy Threads: 12
Loop Id: 20    | 0.028 | Busy Threads: 12
Loop Id: 21    | 0.028 | Busy Threads: 12
Loop Id: 22    | 0.028 | Busy Threads: 12
Loop Id: 23    | 0.028 | Busy Threads: 12
Loop Id: 24    | 0.028 | Busy Threads: 12
Task Id: 01    | 0.029 | Busy Threads: 12
Task Id: 05    | 0.029 | Busy Threads: 12
Task Id: 03    | 0.029 | Busy Threads: 12
Task Id: 08    | 0.029 | Busy Threads: 12
Task Id: 09    | 0.029 | Busy Threads: 12
Task Id: 10    | 0.029 | Busy Threads: 12
Task Id: 06    | 0.029 | Busy Threads: 12
Task Id: 11    | 0.029 | Busy Threads: 12
Task Id: 12    | 0.029 | Busy Threads: 12
Task Id: 04    | 0.029 | Busy Threads: 12
Task Id: 02    | 0.029 | Busy Threads: 12
Task Id: 07    | 0.029 | Busy Threads: 12
Task Id: 13    | 1.018 | Busy Threads: 13
Task Id: 14    | 1.522 | Busy Threads: 14
Task Id: 15    | 2.025 | Busy Threads: 15
Task Id: 16    | 2.530 | Busy Threads: 16
Task Id: 17    | 3.530 | Busy Threads: 17
Task Id: 18    | 4.035 | Busy Threads: 18
Task Id: 19    | 4.537 | Busy Threads: 19
Task Id: 20    | 5.040 | Busy Threads: 20
Task Id: 21    | 5.545 | Busy Threads: 21
Task Id: 22    | 6.048 | Busy Threads: 22
Task Id: 23    | 7.049 | Busy Threads: 23
Task Id: 24    | 8.056 | Busy Threads: 24
Done:          | 20.060

After reaching min threads (default 12), the thread injection speed is significantly slower, with the fastest interval of 500ms.

. net 6 Experiment 1 default ThreadPool settings

Will Net 5 Experiment 1 code in Net 6 executes once


Loop Id: 01    | 0.001 | Busy Threads: 0
Loop Id: 02    | 0.018 | Busy Threads: 1
Loop Id: 03    | 0.018 | Busy Threads: 3
Loop Id: 04    | 0.018 | Busy Threads: 6
Loop Id: 05    | 0.018 | Busy Threads: 4
Loop Id: 06    | 0.018 | Busy Threads: 5
Loop Id: 07    | 0.018 | Busy Threads: 6
Loop Id: 08    | 0.018 | Busy Threads: 8
Task Id: 01    | 0.018 | Busy Threads: 11
Task Id: 04    | 0.018 | Busy Threads: 11
Task Id: 03    | 0.018 | Busy Threads: 11
Task Id: 02    | 0.018 | Busy Threads: 11
Task Id: 05    | 0.018 | Busy Threads: 11
Loop Id: 09    | 0.018 | Busy Threads: 12
Loop Id: 10    | 0.018 | Busy Threads: 12
Loop Id: 11    | 0.018 | Busy Threads: 12
Loop Id: 12    | 0.018 | Busy Threads: 12
Loop Id: 13    | 0.018 | Busy Threads: 12
Task Id: 09    | 0.018 | Busy Threads: 12
Loop Id: 14    | 0.018 | Busy Threads: 12
Loop Id: 15    | 0.018 | Busy Threads: 12
Loop Id: 16    | 0.018 | Busy Threads: 12
Loop Id: 17    | 0.018 | Busy Threads: 12
Task Id: 06    | 0.018 | Busy Threads: 12
Loop Id: 18    | 0.018 | Busy Threads: 12
Loop Id: 19    | 0.018 | Busy Threads: 12
Loop Id: 20    | 0.018 | Busy Threads: 12
Loop Id: 21    | 0.018 | Busy Threads: 12
Loop Id: 22    | 0.018 | Busy Threads: 12
Loop Id: 23    | 0.018 | Busy Threads: 12
Loop Id: 24    | 0.018 | Busy Threads: 12
Task Id: 10    | 0.018 | Busy Threads: 12
Task Id: 07    | 0.019 | Busy Threads: 12
Task Id: 11    | 0.019 | Busy Threads: 12
Task Id: 08    | 0.019 | Busy Threads: 12
Task Id: 12    | 0.019 | Busy Threads: 12
Task Id: 13    | 0.020 | Busy Threads: 16
Task Id: 14    | 0.020 | Busy Threads: 17
Task Id: 15    | 0.020 | Busy Threads: 18
Task Id: 16    | 0.020 | Busy Threads: 19
Task Id: 17    | 0.020 | Busy Threads: 20
Task Id: 18    | 0.020 | Busy Threads: 21
Task Id: 19    | 0.020 | Busy Threads: 22
Task Id: 20    | 0.020 | Busy Threads: 23
Task Id: 21    | 0.020 | Busy Threads: 24
Task Id: 23    | 0.020 | Busy Threads: 24
Task Id: 22    | 0.020 | Busy Threads: 24
Task Id: 24    | 0.020 | Busy Threads: 24
Task SetResult | 0.045 | Busy Threads: 25
Done:          | 0.046

Compared with experiment 1, although the number of threads remained at 12 for some time, the number of threads immediately increased, which will be introduced later Net 6 in this regard.

. net 6 Experiment 2 adjust ThreadPool settings

Will Net 5 Experiment 2 code in Net 6


DefaultMinThreads: 12
Loop Id: 01    | 0.001 | Busy Threads: 0
Loop Id: 02    | 0.014 | Busy Threads: 1
Loop Id: 03    | 0.014 | Busy Threads: 2
Loop Id: 04    | 0.015 | Busy Threads: 5
Loop Id: 05    | 0.015 | Busy Threads: 4
Loop Id: 06    | 0.015 | Busy Threads: 5
Loop Id: 07    | 0.015 | Busy Threads: 7
Loop Id: 08    | 0.015 | Busy Threads: 8
Loop Id: 09    | 0.015 | Busy Threads: 11
Task Id: 06    | 0.015 | Busy Threads: 9
Task Id: 01    | 0.015 | Busy Threads: 9
Task Id: 02    | 0.015 | Busy Threads: 9
Task Id: 05    | 0.015 | Busy Threads: 9
Task Id: 03    | 0.015 | Busy Threads: 9
Task Id: 04    | 0.015 | Busy Threads: 9
Task Id: 07    | 0.015 | Busy Threads: 9
Task Id: 08    | 0.016 | Busy Threads: 9
Task Id: 09    | 0.016 | Busy Threads: 9
Loop Id: 10    | 0.016 | Busy Threads: 9
Loop Id: 11    | 0.016 | Busy Threads: 10
Loop Id: 12    | 0.016 | Busy Threads: 11
Loop Id: 13    | 0.016 | Busy Threads: 13
Task Id: 10    | 0.016 | Busy Threads: 14
Loop Id: 14    | 0.016 | Busy Threads: 14
Loop Id: 15    | 0.016 | Busy Threads: 14
Loop Id: 16    | 0.016 | Busy Threads: 14
Task Id: 11    | 0.016 | Busy Threads: 14
Loop Id: 17    | 0.016 | Busy Threads: 14
Loop Id: 18    | 0.016 | Busy Threads: 14
Loop Id: 19    | 0.016 | Busy Threads: 14
Loop Id: 20    | 0.016 | Busy Threads: 14
Loop Id: 21    | 0.016 | Busy Threads: 14
Loop Id: 22    | 0.016 | Busy Threads: 14
Loop Id: 23    | 0.016 | Busy Threads: 14
Loop Id: 24    | 0.016 | Busy Threads: 14
Task Id: 12    | 0.016 | Busy Threads: 14
Task Id: 13    | 0.016 | Busy Threads: 14
Task Id: 14    | 0.016 | Busy Threads: 14
Task Id: 15    | 0.017 | Busy Threads: 18
Task Id: 16    | 0.017 | Busy Threads: 19
Task Id: 17    | 0.017 | Busy Threads: 20
Task Id: 18    | 0.017 | Busy Threads: 21
Task Id: 19    | 0.017 | Busy Threads: 22
Task Id: 20    | 0.018 | Busy Threads: 23
Task Id: 21    | 0.018 | Busy Threads: 24
Task Id: 22    | 0.018 | Busy Threads: 25
Task Id: 23    | 0.018 | Busy Threads: 26
Task Id: 24    | 0.018 | Busy Threads: 26
Task SetResult | 0.018 | Busy Threads: 25
Done:          | 0.019

In the first half, some logs are out of order. It can be seen that, like Experiment 3, thread growth starts immediately after the maximum number of threads is maintained for a short period of time.

. net 6 Experiment 3 TCS Task. Change wait() to thread Sleep

Will Net 5 Experiment 3 code in Net 6


Loop Id: 01    | 0.003 | Busy Threads: 0
Loop Id: 02    | 0.024 | Busy Threads: 1
Loop Id: 03    | 0.025 | Busy Threads: 2
Loop Id: 04    | 0.025 | Busy Threads: 3
Loop Id: 05    | 0.025 | Busy Threads: 7
Loop Id: 06    | 0.025 | Busy Threads: 5
Loop Id: 07    | 0.025 | Busy Threads: 6
Loop Id: 08    | 0.025 | Busy Threads: 7
Loop Id: 09    | 0.025 | Busy Threads: 9
Loop Id: 10    | 0.025 | Busy Threads: 10
Loop Id: 11    | 0.026 | Busy Threads: 10
Loop Id: 12    | 0.026 | Busy Threads: 11
Loop Id: 13    | 0.026 | Busy Threads: 12
Loop Id: 14    | 0.026 | Busy Threads: 12
Loop Id: 15    | 0.026 | Busy Threads: 12
Loop Id: 16    | 0.026 | Busy Threads: 12
Loop Id: 17    | 0.026 | Busy Threads: 12
Loop Id: 18    | 0.026 | Busy Threads: 12
Loop Id: 19    | 0.026 | Busy Threads: 12
Loop Id: 20    | 0.026 | Busy Threads: 12
Loop Id: 21    | 0.026 | Busy Threads: 12
Loop Id: 22    | 0.026 | Busy Threads: 12
Loop Id: 23    | 0.026 | Busy Threads: 12
Loop Id: 24    | 0.026 | Busy Threads: 12
Task Id: 01    | 0.026 | Busy Threads: 12
Task Id: 02    | 0.026 | Busy Threads: 12
Task Id: 05    | 0.026 | Busy Threads: 12
Task Id: 04    | 0.026 | Busy Threads: 12
Task Id: 06    | 0.026 | Busy Threads: 12
Task Id: 08    | 0.026 | Busy Threads: 12
Task Id: 09    | 0.026 | Busy Threads: 12
Task Id: 03    | 0.026 | Busy Threads: 12
Task Id: 11    | 0.026 | Busy Threads: 12
Task Id: 10    | 0.026 | Busy Threads: 12
Task Id: 07    | 0.026 | Busy Threads: 12
Task Id: 12    | 0.026 | Busy Threads: 12
Task Id: 13    | 1.026 | Busy Threads: 13
Task Id: 14    | 2.027 | Busy Threads: 14
Task Id: 15    | 3.028 | Busy Threads: 15
Task Id: 16    | 4.030 | Busy Threads: 16
Task Id: 17    | 5.031 | Busy Threads: 17
Task Id: 18    | 6.032 | Busy Threads: 18
Task Id: 19    | 6.533 | Busy Threads: 19
Task Id: 20    | 7.035 | Busy Threads: 20
Task Id: 21    | 8.036 | Busy Threads: 21
Task Id: 22    | 8.537 | Busy Threads: 22
Task Id: 23    | 9.538 | Busy Threads: 23
Task Id: 24    | 10.039 | Busy Threads: 24
Done:          | 22.041

Results and The experimental three-phase difference of net 5 is not big.

Thread injection

Compared with the above experimental results, the next step is to Net 6 c# implementation of ThreadPool as a material to understand several stages of thread injection (divided according to personal understanding, for reference only).

1. Appearance of the first thread

As the task is scheduled on the queue, the first thread is created.

The following is the code summary of the thread pool when executing the first task. The code is used where counting and related processing are involvedwhile(xxx) + InterlockedConcurrency control can be understood as concept lock. At this stage, in fact, we only need to pay attention toThreadPoolWorkQueue.EnsureThreadRequestedJust do it.

The decompile debug function of rider can be used to help us learn.

Here is the first oneTask.RunCode execution path for

Note: the execution phase is main thread

public static class ThreadPool
{
    internal static readonly ThreadPoolWorkQueue s_workQueue = new ThreadPoolWorkQueue();

    public static bool QueueUserWorkItem(WaitCallback callBack, object state)
    {
        object tpcallBack = new QueueUserWorkItemCallback(callBack!, state);

        s_workQueue.Enqueue(tpcallBack, forceGlobal: true);

        return true;
    }
}

internal sealed class ThreadPoolWorkQueue
{
    [StructLayout(LayoutKind.Sequential)]
    private struct CacheLineSeparated
    {
        private readonly Internal.PaddingFor32 pad1;

        public volatile int numOutstandingThreadRequests;

        private readonly Internal.PaddingFor32 pad2;
    }

    private CacheLineSeparated _separated;

    public void Enqueue(object callback, bool forceGlobal)
    {
        //There are two kinds of tasks executed in the thread pool: ithreadpoolworkitem and task
        Debug.Assert((callback is IThreadPoolWorkItem) ^ (callback is Task));

        if (loggingEnabled && FrameworkEventSource.Log.IsEnabled())
            FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject(callback);

        ThreadPoolWorkQueueThreadLocals? tl = null;
        if (!forceGlobal)
            //Get the local queue. If the thread executing the modified code is not a thread pool thread,
            //You can't get it here. Even if forceglobal is false,
            //Tasks will also be placed in the global queue
            tl = ThreadPoolWorkQueueThreadLocals.threadLocals;

        if (null != tl)
        {
            //Put on local queue
            tl.workStealingQueue.LocalPush(callback);
        }
        else
        {
            //Local global queue
            workItems.Enqueue(callback);
        }

        EnsureThreadRequested();
    }

    internal void EnsureThreadRequested()
    {
        //
        // If we have not yet requested #procs threads, then request a new thread.
        //
        // CoreCLR: Note that there is a separate count in the VM which has already been incremented
        // by the VM by the time we reach this point.
        //
        int count = _separated.numOutstandingThreadRequests;
        while (count < Environment.ProcessorCount)
        {
            int prev = Interlocked.CompareExchange(ref _separated.numOutstandingThreadRequests, count + 1, count);
            if (prev == count)
            {
                ThreadPool.RequestWorkerThread();
                break;
            }
            count = prev;
        }
    }

    public static class ThreadPool
    {

        /// <summary>
        /// This method is called to request a new thread pool worker to handle pending work.
        /// </summary>
        internal static void RequestWorkerThread() => PortableThreadPool.ThreadPoolInstance.RequestWorker();
    }

    internal sealed class PortableThreadPool
    {
        public static readonly PortableThreadPool ThreadPoolInstance = new PortableThreadPool();

        internal void RequestWorker()
        {
            // The order of operations here is important. MaybeAddWorkingWorker() and EnsureRunning() use speculative checks to
            // do their work and the memory barrier from the interlocked operation is necessary in this case for correctness.
            Interlocked.Increment(ref _separated.numRequestedWorkers);
            WorkerThread.MaybeAddWorkingWorker(this);
            //Initialize gatethread
            GateThread.EnsureRunning(this);
        }

        /// <summary>
        /// The worker thread infastructure for the CLR thread pool.
        /// </summary>
        private static class WorkerThread
        {
            internal static void MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
            {
                ThreadCounts counts = threadPoolInstance._separated.counts;
                short numExistingThreads, numProcessingWork, newNumExistingThreads, newNumProcessingWork;
                //This while (true) is to ensure that the correct number of threads to be created is calculated
                while (true)
                {
                    numProcessingWork = counts.NumProcessingWork;
                    if (numProcessingWork >= counts.NumThreadsGoal)
                    {
                        return;
                    }

                    newNumProcessingWork = (short)(numProcessingWork + 1);
                    numExistingThreads = counts.NumExistingThreads;
                    newNumExistingThreads = Math.Max(numExistingThreads, newNumProcessingWork);

                    ThreadCounts newCounts = counts;
                    newCounts.NumProcessingWork = newNumProcessingWork;
                    newCounts.NumExistingThreads = newNumExistingThreads;

                    ThreadCounts oldCounts = threadPoolInstance._separated.counts.InterlockedCompareExchange(newCounts, counts);

                    if (oldCounts == counts)
                    {
                        break;
                    }

                    counts = oldCounts;
                }

                int toCreate = newNumExistingThreads - numExistingThreads;
                int toRelease = newNumProcessingWork - numProcessingWork;

                if (toRelease > 0)
                {
                    s_semaphore.Release(toRelease);
                }

                while (toCreate > 0)
                {
                    if (TryCreateWorkerThread())
                    {
                        toCreate--;
                        continue;
                    }

                    counts = threadPoolInstance._separated.counts;
                    while (true)
                    {
                        ThreadCounts newCounts = counts;
                        newCounts.SubtractNumProcessingWork((short)toCreate);
                        newCounts.SubtractNumExistingThreads((short)toCreate);

                        ThreadCounts oldCounts = threadPoolInstance._separated.counts.InterlockedCompareExchange(newCounts, counts);
                        if (oldCounts == counts)
                        {
                            break;
                        }
                        counts = oldCounts;
                    }
                    break;
                }
            }

            private static bool TryCreateWorkerThread()
            {
                try
                {
                    // Thread pool threads must start in the default execution context without transferring the context, so
                    // using UnsafeStart() instead of Start()
                    Thread workerThread = new Thread(s_workerThreadStart);
                    workerThread.IsThreadPoolThread = true;
                    workerThread.IsBackground = true;
                    // thread name will be set in thread proc
                    workerThread.UnsafeStart();
                }
                catch (ThreadStartException)
                {
                    return false;
                }
                catch (OutOfMemoryException)
                {
                    return false;
                }

                return true;
            }
        }
    }
}

2. Increase in the number of threads before reaching min threads

Careful friends will find in the above codeEnsureThreadRequestedMethod has a termination condition,_separated.numOutstandingThreadRequests == Environment.ProcessorCount, add one at a timeThreadRequestedIt seems that the maximum number of threads allowed to create this environment is + 1 ProcessorCount?

actuallyThreadPoolWorkQueueMaintainedNumOutstandingThreadRequestsThis value will be displayed in the thread pool after the thread actually runsThreadPoolWorkQueue.DispatchMethod – 1. In other words, as long as one thread is actually running, the second thread can be createdEnvironment.ProcessorCount + 1A thread. Of course, when adding the 13th task to threadpoolworkqueue, it doesn’t matter if the 13th worker thread is not allowed to be created, because the task has been queued and will be taken away by the running worker thread.

The initial value of Min threads is the number of CPU cores in the running environment, which can be accessed throughThreadPool.SetMinThreadsThe valid range of parameters is [1, Max threads].

A counter is maintained in the portablethreadpoolPortableThreadPool.ThreadPoolInstance._separated.counts, three values related to worker thread are recorded:

  • Numprocessingwork: the worker thread that is currently executing the task.
  • Numexistingthreads: actual worker threads in the current thread pool.
  • Numthreadsgoal: the maximum worker threads currently allowed to be created. The initial value is min threads.

    internal class PortableThreadPool
    {

        public static readonly PortableThreadPool ThreadPoolInstance = new PortableThreadPool();

        private CacheLineSeparated _separated;

        private struct CacheLineSeparated
        {
            public ThreadCounts counts;
        }

        /// <summary>
        /// Tracks information on the number of threads we want/have in different states in our thread pool.
        /// </summary>
        private struct ThreadCounts
        {
            /// <summary>
            /// Number of threads processing work items.
            /// </summary>
            public short NumProcessingWork { get; set; }

            /// <summary>
            /// Number of thread pool threads that currently exist.
            /// </summary>
            public short NumExistingThreads { get; set; }

            // <summary>
            /// Max possible thread pool threads we want to have.
            /// </summary>
            public short NumThreadsGoal { get; set; }
        }
    }

3. Starvation avoidance

As mentioned above, as the task enters the queue system, the worker thread will grow until it reaches numthreadsgoal.

NumThreadsGoalIt’s 12. The first 12 threads are blocked. The 13th task added to the queue system cannot be taken away and executed by the first 12 threads.

In this case, the starvation avoidance mechanism of the thread pool works.

In the first stage mentioned above, except that the first thread in the thread pool will be created,GateThreadIt will also be initialized. In the code excerpt of the first phase, you can see the initialization of gatethread.

internal sealed class PortableThreadPool
{
    public static readonly PortableThreadPool ThreadPoolInstance = new PortableThreadPool();

    internal void RequestWorker()
    {
        Interlocked.Increment(ref _separated.numRequestedWorkers);
        WorkerThread.MaybeAddWorkingWorker(this);
        //Initialize gatethread
        GateThread.EnsureRunning(this);
    }
}

stayGateThreadIt is an independent thread. Check it every 500ms. If numprocessingwork > = numthreadsgoal(WorkerThread.MaybeAddWorkingWorkerDo not addWorker ThreadSet the new numthreadsgoal = numprocessingwork + 1, and callWorkerThread.MaybeAddWorkingWorker, so newWorker Thread It can be used by workerthread MaybeAddWorkingWorkerestablish.

That explains why The default number of threads after the number of threads per gods5 is reached500msOne.

Since the growth of threads will be relatively slow in the third stage, experienced developers will set a large min threads when the application starts to make it late or not enter the third stage.

Thread injection in Net 6

. net 6 and Compared with experiment 2 of net 5, there is a significant difference in the growth speed of threads after reaching min threads, but there is little difference between experiment 3 of the two.

. net 6 for task The scenario of thread blocking in the thread pool caused by wait is optimized, but if the number of threads is not enough for this reason, it is still the strategy of starvation avoidance.

The new ThreadPool provides aThreadPool.NotifyThreadBlockedInternal interface, which will callGateThread.WakeTo wake upGateThreadOriginally, the logic is executed once every 500ms, and the interval of 500ms is throughAutoResetEventImplemented, soGateThread.WakeIt’s also very simple.

Key code diagram, non real code:

internal class PortableThreadPool
{
    public bool NotifyThreadBlocked()
    {
        // ...
        GateThread.Wake(this);
        return true;
    }

    private static class GateThread
    {
        private static readonly AutoResetEvent DelayEvent = new AutoResetEvent(initialState: false);

        //Gatethread entry method
        private static void GateThreadStart()
        {
            while(true)
            {
                DelayEvent.WaitOne(500);
                // ...
            }
        }

        public static void Wake(PortableThreadPool threadPoolInstance)
        {
            DelayEvent.Set();
            EnsureRunning(threadPoolInstance);
        }
    }

Hill climbing algorithm

In addition to the thread injection mechanism described above, starting from CLR 4.0, a thread pool is implemented to deduce the optimal number of threads in the thread pool according to the collected thread pool throughput data (recorded when each task is completed).

The algorithm implementation is located inHillClimbing.ThreadPoolHillClimber.Update, interested friends can go and have a look.


public (int newThreadCount, int newSampleMs) Update(int currentThreadCount, double sampleDurationSeconds, int numCompletions)
  • Currentthreadcount: current number of threads
  • Sampledurationseconds: sampling interval
  • Numcompletions: the number of tasks completed during this sampling interval
  • Newthreadcount: number of new threads
  • Newsample: new sampling interval

Destruction of unnecessary threads

If there are still tasks to be executed in the local queue when the thread needs to be removed, these tasks will be transferred to the global queue.
In the following scenarios, the thread pool will destroy unnecessary threads, which is not necessarily comprehensive and limited to the author’s current cognition.

  • When a task cannot be retrieved from the queue system.
  • When the current thread is determined to be redundant through the mountain climbing algorithm.

reference material

https://www.codeproject.com/Articles/3813/NET-s-ThreadPool-Class-Behind-The-Scenes
https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/
https://mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/
https://docs.microsoft.com/zh-CN/previous-versions/msp-n-p/ff963549(v=pandp.10)?redirectedfrom=MSDN

This is about Net 6 thread pool ThreadPool implementation method is introduced here. I hope it will be helpful to your study, and I hope you can support developpaer.