GMP principle and scheduling analysis of golang scheduler

Time:2020-12-2

This paper mainly introduces the process and principle of goroutine scheduler in detail. It can have a clear understanding of the detailed scheduling process of go scheduler. It takes 4 days to make 30 + graphs (recommended Collection), including the following chapters.

Chapter oneThe origin of golang scheduler

Chapter twoGMP model and design idea of goroutine scheduler

Chapter threeFull text analysis of goroutine scheduling scene process

1、 The origin of golang “scheduler”?

(1) The single process era does not require a scheduler

We know that all software runs on the operating system, and the CPU is really used to work (calculate). In the early operating system, each program was a process, and the next process could not be carried out until a program was finished, which is the “single process era”

All programs can only happen in serial.
GMP principle and scheduling analysis of golang scheduler

The early single process operating system faced two problems

1. In a single execution process, a computer can only process one task.

2. CPU time waste caused by process blocking.

Can we have multiple processes to perform multiple tasks together?

And then the operating system had itThe earliest concurrency capability: multi process concurrencyWhen a process is blocked, switch to another process waiting to be executed. In this way, the CPU can be used as much as possible, and the CPU will not be wasted.

(2) In the era of multi process / thread, there is a need for scheduler

GMP principle and scheduling analysis of golang scheduler

In multiprocess / multithreading operating system, the problem of blocking is solved, because a process blocking CPU can switch to other processes immediately to execute, and the algorithm of scheduling CPU can ensure that all running processes can be allocated to the running time slice of CPU. From a macro perspective, it seems that multiple processes are running at the same time.

However, new problems arise again. The process has too many resources. The creation, switching and destruction of the process will take a long time. Although the CPU is used, if there are too many processes, a large part of the CPU is used for process scheduling.

How to improve CPU utilization?

But for Linux operating system, the attitude of CPU to process is the same as that of thread.
GMP principle and scheduling analysis of golang scheduler

Obviously, the CPU schedule switches between processes and threads. Although threads look beautiful, in fact, multi-threaded development and design will become more complex, and many problems such as synchronization competition, such as lock and contention conflict, should be considered.

(3) Coprocessing to improve CPU utilization

Multiprocessing and multithreading have improved the concurrency capability of the system. However, in today’s Internet high concurrency scenario, it is unrealistic to create a thread for each task, because it will consume a lot of memory (the virtual memory of the process will occupy 4GB [32-bit operating system], and the thread also needs about 4MB).

A large number of processes / threads have new problems

  • High memory consumption
  • High CPU consumption for scheduling

Well, then engineers found that a thread is actually divided into “kernel mode” thread and “user mode” thread.

A “user mode thread” must be bound to a “kernel mode thread”, but the CPU does not know that there is a “user mode thread”, it only knows that it is running a “kernel mode thread” (PCB process control block of Linux).

GMP principle and scheduling analysis of golang scheduler

In this way, we can further refine and classify the kernel thread as “thread” and the user thread as “co routine”

GMP principle and scheduling analysis of golang scheduler

Seeing this, we are about to open a brain hole. Since a co routine can be bound to a thread, can multiple co routines be bound to one or more threads.

After that, we can see that there are three mapping relationships between coroutines and threads

N: 1 Relationship

N coprocesses are bound to one thread. The advantage is thatThe coprocessor completes the switch in the user mode thread and does not fall into the kernel mode. This switch is very light and fast。 However, it also has great disadvantages. All coprocessors of a process are bound to one thread

Disadvantages:

  • A program can’t use the hardware’s multi-core acceleration capability
  • Once a coroutine is blocked, resulting in thread blocking, other coroutines of the process can not be executed, and there is no concurrency capability at all.

GMP principle and scheduling analysis of golang scheduler

1: 1 Relationship

One coprocessor binds one thread, which is the easiest to implement. The scheduling of the coroutine is completed by the CPU, and there is no n: 1 defect,

Disadvantages:

  • The cost of creating, deleting and switching coprocesses is done by the CPU, which is a bit expensive.

GMP principle and scheduling analysis of golang scheduler

M: N relation

M coprocessors bind one thread, which is a combination of N: 1 and 1:1 types. It overcomes the shortcomings of the above two models, but it is the most complex to implement.

GMP principle and scheduling analysis of golang scheduler

There is a difference between a coprocessor and a thread. Threads are preemptive scheduled by the CPU,The scheduling by user mode is cooperativeAfter one coprocessor gives up the CPU, the next one is executed.

(4) Coroutine of go language

Go uses goroutine and channel to provide an easier to use concurrency method。 Goroutine comes from the concept of coroutine, which allows a group of reusable functions to run on a group of threads. Even if the coroutine is blocked, other coroutines of the thread can be blockedruntimeSchedule, transfer to other runnable threads. The most important thing is that programmers can’t see these underlying details, which reduces the difficulty of programming and provides easier concurrency.

In go, coroutines are called goroutines, which are very lightweight. A goroutine only takes a few KB, and these KB are enough for goroutine to run. This can support a large number of goroutines in limited memory space and support more concurrency. Although a goroutine stack only occupies a few KB, it is actually scalable. If you need more content,runtimeIt is automatically assigned to goroutine.

Goroutine features:

  • Less memory (several KB)
  • More flexible scheduling (runtime scheduling)
(5) Abandoned goroutine scheduler

Well, now that we know the relationship between a coroutine and a thread, the most critical point is the implementation of the scheduler that schedules the coroutine.

The scheduler currently used by go was redesigned in 2012. Due to the performance problems of the former scheduler, it was abandoned after 4 years of use. Let’s first analyze how the abandoned scheduler works?

Most articles will use G to express goroutine and m to represent thread, so we will also use this expression of correspondence.

GMP principle and scheduling analysis of golang scheduler

Let’s take a look at how the abandoned golang scheduler is implemented?

GMP principle and scheduling analysis of golang scheduler

If M wants to execute and put back g, it must access the global g queue, and M has multiple threads, that is, multiple threads need to be locked to ensure mutual exclusion / synchronization. Therefore, the global g queue is protected by mutex.

The old regulator has several disadvantages

  1. Creating, destroying, and scheduling g requires each m to acquire a lock, which is what happensFierce lock competition
  2. M transfer g will causeLatency and additional system load。 For example, when G contains the creation of new procedures, m creates G ‘, in order to continue to execute g, it is necessary to hand G’ to M ‘for executionPoor localityBecause G ‘and G’ are related, it is better to execute them on M ‘rather than other M’.
  3. System calls (CPU switching between M) result in frequent thread blocking and unblocking operations, which increases the system overhead.

2、 Design idea of GMP model for goroutine scheduler

Faced with the previous scheduler problem, Go designed a new scheduler.

In the new scheduler, m (thread) and G (goroutine) are dequeued, and P (processor) is introduced.

GMP principle and scheduling analysis of golang scheduler

Processor, which contains the resources to run goroutineIf a thread wants to run goroutine, it must first get P, which also contains the runnable g queue.

(1) GMP model

In go,Thread is the entity that runs goroutine. The function of scheduler is to allocate the runnable goroutine to the working thread

GMP principle and scheduling analysis of golang scheduler

  1. Global queue(global queue): stores the G waiting to run.
  2. Local queue for P: similar to the global queue, it also stores the G waiting to run. The number of G stored is limited, not more than 256. When a new G ‘is created, G’ first joins the local queue of P. if the queue is full, half of G in the local queue will be moved to the global queue.
  3. P list: all P’s are created at the start of the program and stored in an array, up toGOMAXPROCS(configurable).
  4. M: if a thread wants to run a task, it has to get P from the local queue of P. when the P queue is empty, m will also try to get the global queuetakeA batch of G is put into the local queue of P, or from the local queue of other PstealHalf of them are placed in their own local P queue. M runs g. after G is executed, m gets the next G from P and repeats.

Goroutine scheduler and OS scheduler are combined by M. each m represents a kernel thread. OS scheduler is responsible for allocating kernel threads to the CPU core for execution

On the number of P and M

1. Number of P

  • By the startup time environment variable$GOMAXPROCSOr byruntimeMethodsGOMAXPROCS()decision. This means that at any time during the execution of the program, only$GOMAXPROCSGoroutines are running at the same time.

2. Number of M:

  • The limitation of go language itself: when go program starts, it will set the maximum number of M, which is 10000 by default. However, it is difficult for the kernel to support such a large number of threads, so this limit can be ignored.
  • The setmaxthreads function in runtime / debug sets the maximum number of M
  • If an M is blocked, a new m will be created.

There is no absolute relationship between M and the number of P. if an M blocks, P will create or switch to another M. therefore, even if the default number of P is 1, many M may be created.

When will p and m be created

1. When p is created: after determining the maximum number of P n, the runtime system will create N P based on this number.

2. When m is created: there are not enough m to associate P and run the runnable g in it. For example, all M’s are blocked at this time, and there are many ready tasks in P. they will search for idle m, and if not, they will create new m.

(2) Design strategy of scheduler

Reuse threads: avoid frequent creation and destruction of threads, but reuse of threads.

1) Work stabilizing mechanism

When this thread has no runnable g, try to steal G from P bound by other threads instead of destroying the thread.

2) Hand off mechanism

When the thread is blocked by G for system call, the thread releases bound P and transfers p to other idle threads for execution.

Using parallelismGOMAXPROCSSet the number of P, up toGOMAXPROCSThreads are distributed on multiple CPUs and run simultaneously.GOMAXPROCSIt also limits the degree of concurrency, such asGomaxprocs = cores / 2At most half of the CPU cores are used for parallelism.

seize: in coroutine, it is necessary to wait for a coroutine to give up the CPU before executing the next coroutine. In go, a goroutine occupies up to 10ms of CPU to prevent other goroutines from starving to death. This is the difference between goroutine and coroutine.

Global g queue: there is still a global g queue in the new scheduler, but the function has been weakened. When m can’t steal G from other P by executing work stabilizing, it can get G from global g queue.

(3) Go func() scheduling process

GMP principle and scheduling analysis of golang scheduler

From the figure above, we can draw several conclusions

1. We create a goroutine through go func();

2. There are two queues to store g. one is the local queue of local scheduler P and the other is the global g queue. The newly created g will be saved in the local queue of P. if the local queue of P is full, it will be saved in the global queue;

3. G can only run in M. an M must hold a P. m and P have a 1:1 relationship. M will pop up an executable G from the local queue of P to execute. If the local queue of P is empty, it will steal an executable G from other MP combinations to execute;

4. The process of M scheduling g execution is a cycle mechanism;

5. When m executes a g, if syscall or other blocking operations occur, m will block. If some G are currently executing, runtime will detach this thread m from P, and then create a new thread of operating system (reuse idle thread if available) to serve this p;

6. When the M system call ends, the G will try to get an idle p execution and put it into the local queue of the P. If P is not available, the thread M becomes dormant and joins the idle thread. Then the G is put into the global queue.

(4) Life cycle of scheduler

GMP principle and scheduling analysis of golang scheduler

Special M0 and G0

M0

M0It is the main thread numbered 0 after the program is started. The instance corresponding to this m will be in the global variable runtime.m0 and does not need to be allocated on the heap. M0 is responsible for executing the initialization operation and starting the first g. after that, M0 is the same as other m’s.

G0

G0Every time an M is started, the first gourdine is created. G0 is only used for scheduling g. G0 does not point to any executable function. Each m has its own G0. When scheduling or system calls, the stack space of G0 is used, and the G0 of global variable is G0 of M0.

Let’s track a piece of code

package main

import "fmt"

func main() {
    fmt.Println("Hello world")
}

Next, we will analyze the structure of the scheduler according to the above code.

It will also go through the process as shown in the figure above:

  1. The runtime creates the initial threads M0 and goroutine G0 and associates them.
  2. Scheduler initialization: initializes M0, stack, garbage collection, and creates and initializes a p list of gomaxprocs.
  3. The main function in the sample code ismain.mainruntimeThere is also a main function in——runtime.mainAfter the code is compiled,runtime.mainWill callmain.mainWhen the program starts, theruntime.mainCreate a goroutine, call it main goroutine, and add it to the local queue of P.
  4. When M0 is started, M0 has already been bound to P. it will get G from the local queue of P and get the main goroutine.
  5. G owns the stack, and M sets the running environment according to the stack information and scheduling information in G
  6. M operation G
  7. G exits and returns to m again to get a runnable g. This is repeated untilmain.mainsign out,runtime.mainPerform defer and panic processing, or callruntime.exitExit the program.

The life cycle of a scheduler almost covers the life of a go program,runtime.mainBefore goroutine is executed, it is necessary to prepare for the scheduler,runtime.mainThe real start of the scheduler is to run the goroutine ofruntime.mainEnd and end.

(5) Visual GMP programming

There are two ways to view GMP data of a program.

Method 1: go tool trace

Trace records the runtime information and provides a visual web page.

Simple test code: the main function creates a trace, which runs in a separate goroutine, and then main prints “Hello world” to exit.

trace.go

package main

import (
    "os"
    "fmt"
    "runtime/trace"
)

func main() {

    //Create trace file
    f, err := os.Create("trace.out")
    if err != nil {
        panic(err)
    }

    defer f.Close()

    //Start trace goroutine
    err = trace.Start(f)
    if err != nil {
        panic(err)
    }
    defer trace.Stop()

    //main
    fmt.Println("Hello World")
}

Run the program

$ go run trace.go 
Hello World

You’ll get onetrace.outFile, and then we can open it with a tool to analyze it.

$ go tool trace trace.out 
2020/02/23 10:44:11 Parsing trace...
2020/02/23 10:44:11 Splitting trace...
2020/02/23 10:44:11 Opening browser. Trace viewer is listening on http://127.0.0.1:33479

We can open it through a browserhttp://127.0.0.1:33479Website, clickview traceCan see the visual scheduling process.

GMP principle and scheduling analysis of golang scheduler

GMP principle and scheduling analysis of golang scheduler

G information

Click on the visual data bar in the goroutines line and we’ll see some details.

GMP principle and scheduling analysis of golang scheduler

There are two G in the program. One is a special G0, which is an initialization g that each m must have. We don't need to discuss this.

G1 should be the main goroutine, which can run and run for a period of time.

M information

Click on the visual data bar in the thread line, and we will see some detailed information.

GMP principle and scheduling analysis of golang scheduler

There are two m in the program. One is a special M0, which is used for initialization. We don’t need to discuss this.

P Information
GMP principle and scheduling analysis of golang scheduler

Called in G1main.main, createdtrace goroutine g18。 G1 runs on P1 and G18 runs on P0.

There are two p’s. We know that a p must be bound to an M to schedule G.

We’re looking at the M information above.

GMP principle and scheduling analysis of golang scheduler

We will find that when G18 is running on P0, there is indeed an M data in the threads line. Click to view it as follows:

GMP principle and scheduling analysis of golang scheduler

An extra M2 should be M2 created dynamically by P0 to execute G18

Method 2: debug trace

package main

import (
    "fmt"
    "time"
)

func main() {
    for i := 0; i < 5; i++ {
        time.Sleep(time.Second)
        fmt.Println("Hello World")
    }
}

compile

$ go build trace2.go

Run in debug mode

$ GODEBUG=schedtrace=1000 ./trace2 
SCHED 0ms: gomaxprocs=2 idleprocs=0 threads=4 spinningthreads=1 idlethreads=1 runqueue=0 [0 0]
Hello World
SCHED 1003ms: gomaxprocs=2 idleprocs=2 threads=4 spinningthreads=0 idlethreads=2 runqueue=0 [0 0]
Hello World
SCHED 2014ms: gomaxprocs=2 idleprocs=2 threads=4 spinningthreads=0 idlethreads=2 runqueue=0 [0 0]
Hello World
SCHED 3015ms: gomaxprocs=2 idleprocs=2 threads=4 spinningthreads=0 idlethreads=2 runqueue=0 [0 0]
Hello World
SCHED 4023ms: gomaxprocs=2 idleprocs=2 threads=4 spinningthreads=0 idlethreads=2 runqueue=0 [0 0]
Hello World
  • SCHEDThe output string of routini represents the debug flag;
  • 0ms: the time from the start of the program to the output of this line of log;
  • gomaxprocs: the number of P. in this example, there are two P, because the attribute of the default P is the same as the default number of CPU cores. Of course, it can also be set through gomaxprocs;
  • idleprocs: the number of P in idle state; through the difference between gomaxprocs and idleprocs, we can know the number of P executing go code;
  • threads: os threads/M, including the number of m used by the scheduler, plus the number of threads like Sysmon used by the runtime;
  • spinningthreads: the number of OS threads in spin state;
  • idlethread: the number of OS threads in idle state;
  • runqueue=0: the number of G in the scheduler global queue;
  • [0 0]: the number of G in the local queue of 2 P respectively.

Next, we will continue to analyze some scenarios of GMP scheduling principle in detail.

3、 Full analysis of go scheduler scheduling scenario process

(1) Scenario 1

P has G1, M1 gets P and starts running G1, G1 uses itgo func()G2 is created, and G2 is preferentially added to P1’s local queue for locality.
GMP principle and scheduling analysis of golang scheduler


(2) Scenario 2

After G1 operation is completed (function:goexit)The goroutine running on M is switched to G0, and G0 is responsible for the coordination process switching during scheduling (function:schedule)。 Get G2 from local queue of P, switch from G0 to G2, and start running G2 (function:execute)。 The reuse of thread M1 is realized.

GMP principle and scheduling analysis of golang scheduler


(3) Scenario 3

Suppose that each P’s local queue can hold only three G’s. G2 needs to create six GS. The first three GS (G3, G4, G5) have joined P1’s local queue, and P1’s local queue is full.

GMP principle and scheduling analysis of golang scheduler


(4) Scene 4

When G2 creates G7, it is found that the local queue of P1 is full and needs to be executedload balancing (add the first half of the local queue in P1 with the newly created GtransferTo global queue)

(it is not necessarily a new G in the implementation. If the G is executed after G2, it will be saved in the local queue. An old G is used to replace the new G and join the global queue.)

GMP principle and scheduling analysis of golang scheduler

When they are transferred to the global order, they are disturbed. So G3, G4, G7 are transferred to the global queue.


(5) Scene 5

When G2 creates G8, P1’s local queue is not full, so G8 will be added to P1’s local queue.

GMP principle and scheduling analysis of golang scheduler

The reason why G8 joins the local queue at P1 is that P1 is bound to M1 at this time, and M1 is executing G2 at this time. So the new G created by G2 will be placed on its m-bound P first.


(6) Scene 6

regulations:When G is created, the running g will try to wake up other idle p and M combinations to execute

GMP principle and scheduling analysis of golang scheduler

Suppose G2 wakes m2, M2 binds P2 and runs G0, but P2 local queue does not have g, M2 is spinning thread at this time(no G but running thread, constantly looking for G)


(7) Scene 7

M2 attempts to get a batch of G from the global queue (referred to as “GQ”) and put it into the local queue of P2 (function:findrunnable())。 The number of G M2 fetches from the global queue follows the following formula:

n = min(len(GQ)/GOMAXPROCS + 1, len(GQ/2))

Take at least 1 g from the global queue, but do not move too many G from the global queue to the local P queue at a time, leaving points for other P. This isLoad balancing from global queue to local queue

GMP principle and scheduling analysis of golang scheduler

Suppose there are four P’s in our scenario (gomaxprocs is set to 4, then we allow up to 4 Ps for m to use). Therefore, M2 only takes one g (G3) from the global queue to move P2 local queue, and then switches from G0 to G3 to run G3.


(8) Scene 8

Suppose G2 has been running on M1 all the time. After two rounds, M2 has obtained G7 and G4 from the global queue to P2’s local queue and finished running. The global queue and P2 local queue are empty, as shown in the left half of scenario 8.

GMP principle and scheduling analysis of golang scheduler

If there is no G in the global queue, then M has to perform work stealing: steal half of G from other p with G and put it into the local P queue。 P2 takes half of the G from the tail of P1’s local queue. In this case, only one G8 is in the half of P1’s local queue, and it is put into P2’s local queue and executed.


(9) Scene 9

G1 local queues G5 and G6 have been stolen and run by other M. currently M1 and M2 are running, G2 and G8 respectively, m3 and M4 have no goroutine to run, m3 and M4 are in operationSpin stateThey keep looking for goroutine.

GMP principle and scheduling analysis of golang scheduler

Why let m3 and M4 spin? The essence of spin is running, but the thread is running without executing g, which is a waste of CPU. Why not destroy the site to save CPU resources. Because creating and destroying CPUs also wastes time, weIt is hoped that when a new goroutine is created, an M can run it immediatelyIf you destroy and create a new one, the delay will be increased and the efficiency will be reduced. Of course, we also consider that too many spinning threads are a waste of CPU, so there are at mostGOMAXPROCSSpin threads (in the current exampleGOMAXPROCS=4, so there are 4 P in total). The extra idle threads will make them sleep.


(10) Scene 10

Suppose that m3 and M4 are spinning threads, and M5 and M6 are idle threads (we don’t get the binding of P. note that we can only have 4 P at most here, so the number of P should always be m > = P, and most of them are m preempting the P that needs to be run). G8 creates G9 and G8Blocked system callsM2 and P2 are unbound immediately. P2 will perform the following judgment: if P2 local queue has g, global queue has g or idle m, P2 will immediately wake up an M to bind with it, otherwise P2 will join the free p list and wait for m to obtain available P. In this scenario, P2 local queue has G9, which can be bound to other idle threads M5.

GMP principle and scheduling analysis of golang scheduler

(11) Scene 11

G8 creates G9, if G8 doesNon blocking system call
GMP principle and scheduling analysis of golang scheduler

M2 and P2 will be unbound, but M2 will remember P2, and then G8 and M2 will entersystem callStatus. When G8 and M2 exit the system call, they will try to get P2. If they can’t get them, they will get the idle P. if they still don’t, G8 will be recorded as runnable and added to the global queue. M2 will become dormant because there is no binding of P (long dormancy waiting for GC recovery and destruction).


4、 Summary

In conclusion, the go scheduler is lightweight and simple enough to support goroutine’s scheduling work and enable go to have native (powerful) concurrency.The essence of go scheduling is to allocate a large number of goroutines to a small number of threads for execution, and use multi-core parallelism to achieve more powerful concurrency.


###About the author:

Author:Aceld (Liu danbing)

mail: [email protected]
github: github.com/aceld
Original book gitbook:legacy.gitbook.com/@aceld

It’s not easy to create and learn together. Welcome to pay attention to the author and reply to “zinx”

GMP principle and scheduling analysis of golang scheduler


This work adoptsCC agreementThe author and the link to this article must be indicated in the reprint

Recommended Today

Source code interpretation of Dubbo layered design idea

1、 Overview of Dubbo layered overall design Let’s start with the following figure to briefly introduce the Dubbo layered design concept: (quoted from duboo Development Guide – framework design document) As shown in the figure, the RPC implemented by Dubbo is divided into 10 layers: service, config, proxy, registry, cluster, monitor, protocol, exchange, transport and […]