Thread II

Time:2021-5-10
Thread pool
Why use thread pools?

Creating and destroying threads is an expensive and time-consuming operation. Too many threads will also waste memory resources. Because the operating system must schedule runnable threads and perform context switching, too many threads will affect performance, and thread pool can improve these situations.

What is a thread pool?

We can think of thread pool as a collection of threads that can be used by applications. Each CLR has a thread pool, which is shared by all AppDomains controlled by CLR. If multiple CLRs are loaded in a process, each CLR has its own thread pool.

How does thread pool work?

When CLR is initialized, there are no threads in the thread pool. Internally, the thread pool maintains an operation request queue. When an application wants to perform an asynchronous operation, it calls a method to append a record item to the queue of the thread pool. The thread pool code extracts the record item from the queue and dispatches the record item to a thread pool thread. At this point, if there are no threads in the thread pool, a new thread will be created (with a certain performance overhead). When the thread completes the task, it will not be destroyed, but will return to the thread pool and enter the idle state, waiting for the response to the next request. Because threads are no longer destroyed, there is no additional performance loss.

If an application makes many requests to a thread, the thread pool attempts to process the requests with only one existing thread. If the speed of sending requests exceeds the speed of thread pool thread processing, additional threads are created to process them.

When the application program stops sending requests to the thread pool, the threads in the thread pool do nothing, resulting in a waste of memory resources. So there is a mechanism: when the thread pool thread is idle for a period of time (different versions of CLR have different time), the thread will automatically wake up and terminate itself to release resources.

characteristic

Thread pool can hold a small number of threads to avoid resource waste, and can also create a large number of threads to make full use of multiprocessors, hyper threaded processors and multi-core processors. In other words, the thread pool is heuristic. If the application has many tasks to perform and CPU is available, the thread pool will create more threads.

Asynchronous programming with thread pool
private static void SomeMethod(object state)
{
    //Method is executed by thread pool thread
    Console.WriteLine("state = {0}", state);
    Thread.Sleep(10000);

    //After the method returns, the thread returns to the thread pool and waits for the next request
}

static void Main()
{
    ThreadPool.QueueUserWorkItem(SomeMethod, 1);
    Console.ReadKey();
}
Execution context

Each thread is associated with an execution context data structure, including security settings, host settings and logic call context data. Under normal circumstances, whenever a thread (initial thread) uses another thread (auxiliary thread) to execute a task, the execution context of the former should be copied to the auxiliary thread. This can ensure that any operation of the auxiliary thread uses the same security settings and host settings, and also ensure that the logical call context of the initial thread can be used in the auxiliary thread. By default, the execution context of the initial thread can flow to any worker thread, but the execution context contains a lot of information, which will affect the performance.

The execution context controls the execution context
static void Main(string[] args)
{
    //Put the data into the logical call context of the main thread
    CallContext.LogicalSetData("Name", "DoubleJ");

    //Thread pool thread can access logic call context data
    ThreadPool.QueueUserWorkItem(state => {
        Console.WriteLine("state = {0}, name = {1}", state, CallContext.LogicalGetData("Name"));
    }, 1);

    //Prevent execution context flow of main thread
    ExecutionContext.SuppressFlow();

    //Thread pool threads will not be able to access logical call context data
    ThreadPool.QueueUserWorkItem(state => {
        Console.WriteLine("state = {0}, name = {1}", state, CallContext.LogicalGetData("Name"));
    }, 2);

    //Resuming execution context flow of main thread
    ExecutionContext.RestoreFlow();

    //Thread pool threads can access logic call context data again
    ThreadPool.QueueUserWorkItem(state => {
        Console.WriteLine("state = {0}, name = {1}", state, CallContext.LogicalGetData("Name"));
    }, 3);

    Console.ReadKey();
}
Running results

Thread II

task

Although the queueuserworkitem method of ThreadPool is very simple, it can’t know when the operation is completed and get the return value. Now using task can make up for these shortcomings.

Wait for the task to complete and get the returned result
static void Main()
{
    Task<int> t = new Task<int>(() =>
    {
        int sum = 0;
        for (int i = 0; i < 10000; i++)
            sum += i;
        return sum;
    });

    //Start task
    t.Start();

    //Wait for the task to complete
    t.Wait();

    //View returned results
    Console.WriteLine("result = {0}", t.Result);
    Console.ReadKey();
}

When a thread calls the wait method, the system will detect whether the task the thread is waiting for has started to execute. If so, the thread calling the wait method will block until the task runs. If the task has not started, the system may use the thread calling the wait method to execute the task. In this case, the thread calling the wait method will not block, it will execute the task and return immediately. If a thread has acquired a thread synchronization lock before calling wait, and the task attempts to acquire the same lock, it will cause a thread deadlock.

Cancel the task
static void Main()
{
    var t = new Task<int>(() =>
    {
        int sum = 0;
        for (int i = 0; i < 10000; i++)
        {
            //If cancelled, an exception is thrown
            cts.Token.ThrowIfCancellationRequested();
            sum += i;
        }
        return sum;
    }, cts.Token);

    t.Start();
    //Asynchronous request, task may have completed
    cts.Cancel();

    try
    {
        //If the task is canceled, result raises an aggregateexception
        Console.WriteLine("result = {0}", t.Result);
    }
    catch (AggregateException exp)
    {
        exp.Handle(e => e is OperationCanceledException);
        Console. Writeline ("task canceled");
    }
    Console.ReadKey();
}

When a task is created, a cancellationtoken can be passed to the constructor of the task to associate the cancellationtoken with the task. If the cancelationtoken is cancelled before the task is scheduled, the task will never be executed again.

Running results

Thread II

Automatically start a new task when a task is finished
private static int Sum(int n)
{
    n += 1;
    Console.WriteLine("n = {0}", n);
    return n;
}

static void Main()
{
    Task<int> t = new Task<int>(n => Sum((int)n), 0);
    t.Start();
    t.ContinueWith(task => Sum(task.Result));
    Console.ReadKey();
}

When the above code completes the task (T), another task will be started. The thread executing the above code will not enter the blocking state and wait for either of the two tasks to complete. The thread can continue to execute other code.

Running results

Thread II

Parent task and child task
private static int Sum(int n)
{
    n += 1;
    Console.WriteLine("n = {0}", n);
    return n;
}

static void Main()
{
    Task<int[]> parent = new Task<int[]>(() =>
    {
        var result = new int[3];
        new Task(() => result[0] = Sum(0), TaskCreationOptions.AttachedToParent).Start();
        new Task(() => { Thread.Sleep(5000); result[1] = Sum(1); }, TaskCreationOptions.AttachedToParent).Start();
        new Task(() => result[2] = Sum(2), TaskCreationOptions.AttachedToParent).Start();
        return result;
    });
    parent.ContinueWith(parentTask => Array.ForEach(parentTask.Result, Console.WriteLine));
    parent.Start();
    Console.ReadKey();
}
Running results

Thread II

Now change a line of code as follows:

new Task(() => { Thread.Sleep(5000); result[1] = Sum(1); }, TaskCreationOptions.AttachedToParent).Start();

//Change the previous code to
new Task(() => { Thread.Sleep(5000); result[1] = Sum(1); }).Start();
Running results

Thread II

conclusion
By default, the task object created by a task is a top-level task. These tasks are not associated with the task that created them. However, a task is associated with the task that created it by using the taskcreationoptions. Attachedtoparent tag. In this way, unless all the subtasks and the subtasks of the subtasks are finished, the parent task will not be considered finished.

Task factory
private static int Sum(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++)
    {
        checked
        {
            sum += i;
        }
    }
    return sum;
}

static void Main()
{
    Task parent = new Task(() => {
        var cts = new CancellationTokenSource();
        var tf = new TaskFactory<int>(
            cts.Token,
            TaskCreationOptions.AttachedToParent,
            TaskContinuationOptions.ExecuteSynchronously,
            TaskScheduler.Default
        );

        //Create and start subtasks
        var childTask = new[]
        {
            tf.StartNew(() => Sum(1000)),
            tf.StartNew(() => Sum(10000)),
            tf.StartNew(() => Sum(100000))
        };

        //Any subtask that throws an exception cancels the rest of the subtasks
        for (int i = 0; i < childTask.Length; i++)
            childTask[i].ContinueWith(t => cts.Cancel(), TaskContinuationOptions.OnlyOnFaulted);

        //After completing all subtasks
        tf.ContinueWhenAll(
            childTask,
            completedTask => completedTask.Where(t => !t.IsFaulted && !t.IsCanceled).Max(t => t.Result),
            CancellationToken.None
        ).ContinueWith(
            t => Console.WriteLine("max result is : {0}", t.Result
        ), TaskContinuationOptions.ExecuteSynchronously);
    });

    //After completion of subtask
    parent.ContinueWith(p =>
    {
        foreach (var e in p.Exception.Flatten().InnerExceptions)
            Console.WriteLine("Exception : {0}", e.Message);
    }, TaskContinuationOptions.OnlyOnFaulted);
    parent.Start();
    Console.ReadKey();
}
Running results

Thread II

The use of parallel class
Parallel’s static for method
//Single threaded execution (not recommended)
for (int i = 0; i < 100000; i++)
    DoSomthing(i);

//Parallel work (recommended)
Parallel.For(0, 100000, (i) => DoSomthing(i));
Parallel’s static foreach method
var collection = new int[] { 1, 2, 3 };
//Not recommended
foreach (var item in collection)
    DoSomthing(item);

//Recommendation
Parallel.ForEach(collection, item => DoSomthing(item));

*If you can use both for and foreach, the for method is faster than the foreach method

Parallel’s static invoke method
//A thread executes multiple methods in sequence
Method1();
Method2();
Method3();

//Thread parallel execution method of thread pool
Parallel.Invoke(
    () => Method1(),
    () => Method2(),
    () => Method3()
);

The premise of using parallel is that the code must be executed in parallel. If the tasks must be executed in sequence, do not use it. All methods in parallel allow the calling thread to participate in the processing. If the calling thread completes its work before the thread in the thread pool completes its part of the work, the calling thread will automatically suspend until all the work is completed.

matters needing attention

Parallel is easy to use, but it also needs overhead. The delegate object must be allocated, and every work item needs to call the delegate once. If there are a large number of work items that can be multithreaded, it may improve performance. Or each work item needs to involve a lot of work, then the performance loss caused by calling delegation can be ignored. However, if there are few work items or each work item can be processed very quickly, the parallel method will damage the performance.

For, foreach, invoke parameters paralleloptions object
  • CancellationToken:Cancellationtoken.none is allowed by default
  • MaxDegreeOfParallelism:Specifies the maximum number of work items that can be executed concurrently
  • TaskScheduler:Specify which taskscheduler to use. The default is taskscheduler. Default
Perform tasks regularly

The following code creates a timer and immediately executes the somemethod method method, and then continues to execute the somemethod method every 1 s

private static  System.Threading.Timer s_Timer;

static void Main()
{
    s_Timer = new Timer(SomeMethod, 6, 0, Timeout.Infinite);
    Console.ReadKey();
}

private static void SomeMethod(object state)
{
    Console.WriteLine("state = {0}", state);

    //Let timer call this method after 1 second
    s_Timer.Change(1000, Timeout.Infinite);
}

Internally, the thread pool uses only one thread for all timer objects. The thread knows how long it will take for the next timer object to trigger. When the next timer object is triggered, the thread will wake up and call the queueuserworkitem of ThreadPool internally to add a work item to the request queue of the thread pool so that the callback method can be called. If the callback method takes a long time to execute, the timer may trigger again, which may cause multiple thread pool threads to execute the callback method at the same time. To solve this problem, when constructing a timer object, specify the period parameter as timeout. Infinite, so that the timer will only be triggered once. If you want to loop the timer, you can call the change method inside the callback method, as in the above code.

Running results

You can see that the console immediately outputs state = 6 and then outputs state = 6 every 1s

Not recommended practice
private static  System.Threading.Timer s_Timer;

static void Main()
{
    s_Timer = new Timer(SomeMethod, 6, 0, 1000);
    Console.ReadKey();
}

private static void SomeMethod(object state)
{
    Console.WriteLine("state = {0}", state);
}

Although this method is consistent with the result of the previous code, once the execution time of the callback method is too long and exceeds the time interval of calling the callback method specified by the period parameter, multiple threads may execute the callback method at the same time, which is not the desired result.

CLR thread pool
Data structure diagram

Thread II

Management worker thread

The threadpool.queueuserworkitem and timer classes always put the work items into the global queue. The worker thread takes the work items out of the queue by using the first in first out algorithm and processes them. Because multiple worker threads may retrieve work items from the global queue at the same time, all worker threads compete for the same thread synchronization lock.

Each worker thread has its own local queue. When a worker thread schedules a task, the task will be added to the local queue of the calling thread. When a worker thread is ready to process a work item, it always checks its local queue first. If there is a task, the worker thread removes the task from its local queue and processes the work item (the worker thread uses the last in first out algorithm to extract the work item in the queue). Because the local queue of each worker thread can only be accessed by itself, there is no need for thread synchronization lock.

When a worker thread finds that its local queue is empty, it will try to get a work item from the tail of another worker thread’s local queue and ask for a thread synchronization lock (which has some impact on performance). If all the local queues are empty, the worker thread will use the first in first out algorithm to try to get the work items from the global queue and get the thread synchronization lock. If the global queue is empty, the worker thread will go to sleep and wait for things to happen. If the sleep time is too long, it will automatically wake up and destroy itself.

The thread pool will quickly create worker threads to make the number equal to the value passed by calling the threadpool.setminthreads method. If the method is not called, it is equal to the number of CPUs allowed by the process by default. Generally, processes are allowed to use all the CPUs on the machine, so the number of worker threads created by the thread pool will soon reach the number of CPUs on the machine. After creating a thread, the thread pool will monitor the completion speed of the work item. If the time is too long, the thread pool will create more threads. If the work item is completed quickly, the worker thread will be destroyed.

CPU cache row and pseudo share
Cache line

In order to improve the speed of accessing memory, the CPU logically divides all memory into cache lines. The cache line is an integer power of two consecutive bytes, and the most common cache line size is64 bytesSo the CPU gets and stores 64 byte blocks from RAM. For example, if an application reads an int32 data, it will get 64 bytes containing the int32 value. Getting more bytes usually improves performance, because most applications continue to access other data around some data after accessing it. At this time, because the adjacent data is already in the CPU cache, the slow speed RAM access is avoided.

However, if two or more cores access the bytes in the same cache line, they must communicate with each other and pass cache lines between different cores. As a result, multiple cores cannot process adjacent bytes at the same time, which has a serious impact on performance.

Code testing
private const int COUNT = 100000000;

private static int s_OperationCount = 2;
private static long s_StartTime;

class SomeType
{
    public int Field1;
    public int Field2;
}

private static void AccessField(SomeType type, int field)
{
    //Each thread accesses the field in type
    for (int i = 0; i < COUNT; i++)
    {
        if (field == 0)
            type.Field1++;
        else
            type.Field2++;
    }

    //Display the time spent after the end of the last thread
    if(Interlocked.Decrement(ref s_OperationCount) == 0)
        Console. Writeline ("time spent: {0}", (stopwatch. Gettimestamp() - S_ StartTime) / (Stopwatch.Frequency / 1000));
}

static void Main()
{
    var type = new SomeType();
    s_StartTime = Stopwatch.GetTimestamp();

    //Two threads access fields in an object
    ThreadPool.QueueUserWorkItem(o => AccessField(type, 0));
    ThreadPool.QueueUserWorkItem(o => AccessField(type, 1));

    Console.ReadKey();
}

The type object of the above code contains two fields field1 and field2, which are most likely in the same cache line. Then start two threads to execute the accessfield method, one thread to operate field1, the other thread to operate field2, and decrease s when each thread completes_ Finally, it shows the total time spent by the two threads to complete the work.

Running results

Thread II

Next, modify the sometype class to make it look like this:

[StructLayout(LayoutKind.Explicit)]
class SomeType
{
    [FieldOffset(0)]
    public int Field1;

    [FieldOffset(64)]
    public int Field2;
}

The modified sometype class uses a cache line to separate field1 field and field2 field. In the first version, the two fields belong to the same cache line, causing different CPUs to pass bytes back and forth. Although from the perspective of program, two threads process different data, from the perspective of CPU cache line, CPU processes the same data, which is calledPseudo sharing. In the modified code, the fields belong to different cache lines, so the CPU can work independently without sharing.

Check the results again, the speed is significantly improved

Thread II

Access array

Because the length information of the array is maintained at the beginning of the array memory, the specific location is after the first few elements. When accessing an element, the CLR verifies that the index used is within the length of the array. Therefore, accessing the elements of an array always involves the length of the array. Therefore, in order to avoid extra pseudo sharing, one thread should not access the first few elements of the array, while another thread should access other elements of the array.