Process scheduling is essential for operating system to realize multi process.
It is said that process scheduling is the most important part of the operating system. I think this statement is too absolute, just like many people often say “XX times higher efficiency of XX function than XX function”, which is divorced from the actual environment. These conclusions are relatively one-sided.
How important is process scheduling? First of all, we need to be clear: process scheduling is the task_ Running state of the process scheduling (see “Linux Process State Analysis”). If a process is not executable (sleeping or otherwise), it has little to do with process scheduling.
Therefore, if your system load is very low, only when you look forward to the stars and the moon will an executable process appear. Then process scheduling is less important. Which process can be executed, let it execute, there is nothing to consider.
On the contrary, if the system load is very high, there are more than n processes in the executable state all the time, waiting to be scheduled to run. In order to coordinate the execution of these n processes, the process scheduler must do a lot of work. If the coordination is not good, the performance of the system will be greatly reduced. At this time, process scheduling is very important.
Although we usually contact many computers (such as desktop system, network server, etc.) load is relatively low, but Linux as a general operating system, can not assume that the system load is low, must do careful design to deal with the process scheduling under high load.
Of course, these designs are of little use in low load (and no real-time) environments. In extreme cases, if the CPU load is always 0 or 1 (there is always only one process or no process to run on the CPU), then these designs are basically futile.
In order to coordinate the “simultaneous” operation of multiple processes, the most basic means of operating system is to define the priority of processes. The priority of the process is defined. If there are multiple processes in the executable state at the same time, the one with higher priority will execute. There is nothing to tangle with.
So, how to determine the priority of the process? There are two ways: specified by user program and dynamically adjusted by kernel scheduler. (we’ll talk about it later)
Linux kernel divides the process into two levels: ordinary process and real-time process. The priority of real-time processes is higher than that of ordinary processes. In addition, their scheduling strategies are also different.
Real time process scheduling
Real time, the original meaning is “a given operation must be completed within a certain time.”. The point is not how fast the operation must be handled, but how time can be controlled (in the worst case, the given time cannot be exceeded).
Such “real-time” is called “hard real-time” and is often used in very sophisticated systems (such as rockets and missiles). Generally speaking, hard real-time systems are relatively specialized.
General operating system such as Linux obviously can not meet such requirements. The existence of interrupt processing, virtual memory and other mechanisms brings great uncertainty to the processing time. Hardware cache, disk seek, bus contention will also bring uncertainty.
For example, consider the sentence “I + +;”. Most of the time, it’s fast. But in extreme cases, it is possible:
1. I memory space is not allocated, CPU triggers page missing exception. When Linux tries to allocate memory in the code dealing with page missing exception, the allocation may fail due to the shortage of system memory, resulting in the process going to sleep;
2. In the process of code execution, the hardware interrupts, and Linux enters the interrupt handler and shelves the current process. In the process of interrupt processing, new hardware interrupts may occur, and interrupts will never be nested;
However, a general operating system such as Linux, which claims to realize “real-time”, only realizes “soft real-time”, that is, to meet the real-time needs of the process as much as possible.
If a process has real-time requirements (it is a real-time process), the kernel will let it execute as long as it is executable, so as to meet its needs for CPU as much as possible, until it completes what it needs to do, and then sleep or exit (become non executable).
If more than one real-time process is in the executable state, the kernel will first satisfy the CPU needs of the highest priority real-time process until it becomes non executable.
Therefore, as long as the high priority real-time process is always in the executable state, the low priority real-time process can not get the CPU; As long as there are real-time processes in the executable state, ordinary processes can not get CPU.
So, what if multiple real-time processes with the same priority are in the executable state? There are two scheduling strategies to choose from
1、SCHED_ FIFO: first in first out. It is not until the first executed process becomes non executable that subsequent processes are scheduled for execution. Under this strategy, the first come process can execute sched_ Yield system call, voluntarily give up the CPU, in order to give the right to the subsequent process;
2、SCHED_ RR: round robin scheduling. The kernel allocates time slices for real-time processes, and when the time slices run out, the next process uses CPU;
To emphasize, these two scheduling strategies and sched_ Yield system calls only aim at the situation that multiple real-time processes with the same priority are in the executable state at the same time.
In Linux, user programs can use sched_ The set scheduler system is used to set the scheduling policy and related scheduling parameters of the process_ The SetParam system call is only used to set scheduling parameters. These two system calls require the user process to have the ability to set process priority (CAP)_ SYS_ Nice, generally speaking, requires root permission) (refer to the article about capability).
By setting the policy of the process to sched_ FIFO or sched_ RR makes the process real-time. The priority of the process is specified by the above two system calls when setting the scheduling parameters.
For real-time processes, the kernel does not try to adjust their priority. Because is the process real-time or not? How real time is it? These questions are related to the application scenarios of the user program. Only the user can answer them, but the kernel can’t guess.
To sum up, the scheduling of real-time process is very simple. The priority and scheduling strategy of the process are determined by the user. The kernel only needs to always select the real-time process with the highest priority to schedule the execution. The only slightly troublesome thing is to consider two scheduling strategies when selecting real-time processes with the same priority.
Scheduling of ordinary processes
The central idea of real-time process scheduling is to let the highest priority real-time process in the executable state occupy the CPU as much as possible, because it has real-time requirements; while the ordinary process is considered to have no real-time requirements, so the scheduler tries to let each ordinary process in the executable state share the CPU peacefully, so that users feel that these processes are running at the same time Yes.
Compared with real-time process, the scheduling of ordinary process is much more complex. The kernel needs to consider two things:
1、 Dynamically adjust the priority of the process
According to the behavior characteristics of processes, processes can be divided into “interactive process” and “batch process”
The main task of interactive process (such as desktop program, server, etc.) is to interact with the outside world. Such processes should have higher priority, they always sleep and wait for input from outside. When the input arrives and the kernel wakes it up, they should be scheduled to execute quickly to respond. For example, if a desktop program doesn’t respond half a second after clicking, users will feel that the system is “stuck”;
The main task of batch processes (such as compilers) is to do continuous operations, so they will continue to be executable. Such a process generally does not need high priority. For example, the compiler runs for a few seconds more, and most users will not care too much;
If the user can clearly know what priority the process should have, he can set the priority through nice and setpriority system calls. (if you want to increase the priority of the process, the user process is required to have a cap_ SYS_ Nice ability.)
However, applications may not be as typical as desktop programs and compilers. Programs can behave in a variety of ways, sometimes like interactive processes, sometimes like batch processes. So that it is difficult for users to set an appropriate priority for it.
Moreover, even if the user clearly knows whether a process is interactive or batch processing, it is mostly due to permissions or laziness and does not set the priority of the process. Have you ever set a priority for a program
Finally, the task of distinguishing the interactive process from the batch process falls to the kernel scheduler.
The scheduler pays attention to the performance of a process in recent years (mainly checking its sleep time and running time), and judges whether it is interactive or batch processing according to some empirical formulas? To what extent? Finally, we decided to adjust its priority.
After the process priority is dynamically adjusted, there are two priorities:
1. The priority set by the user program (if not set, the default value is used), which is called static priority. This is the benchmark of process priority, which is often unchanged in the process of process execution;
2. The actual effective priority after dynamic priority adjustment. This value may change all the time;
2、 Fairness of scheduling
In the system supporting multi process, ideally, each process should occupy CPU fairly according to its priority. And there will be no uncontrollable situation such as “who is lucky and who takes up the most”.
There are two ways to achieve fair scheduling in Linux
1. Processes in the executable state are allocated time slices (according to priority), and processes that run out of time slices are put into the “expiration queue”. When the processes in the executable state are expired, the time slice is redistributed;
2. Dynamically adjust the priority of the process. As the process runs on the CPU, its priority is continuously lowered so that other processes with lower priority can get the chance to run;
The latter method has smaller scheduling granularity, and combines “fairness” with “dynamic priority adjustment”, which greatly simplifies the code of kernel scheduler. Therefore, this method has become a new favorite of kernel scheduler.
To emphasize, the above two points are only for ordinary processes. For real-time processes, the kernel can’t adjust the priority dynamically, and there is no fairness.
Ordinary process specific scheduling algorithm is very complex, and with the evolution of Linux kernel version is also constantly changing (not just simple adjustment), so this article will not continue in-depth.
Efficiency of scheduler
“Priority” defines which process should be scheduled, and the scheduler must also care about efficiency. Like many processes in the kernel, the scheduler will be executed frequently. If the efficiency is not good, it will waste a lot of CPU time, resulting in system performance degradation.
In Linux 2.4, executable processes are hung in a linked list. Every time scheduling, the scheduler needs to scan the entire list to find the optimal process to run. The complexity is O (n);
In the early days of Linux 2.6, executable processes were hung in n (n = 140) linked lists. Each linked list represents a priority. There are as many linked lists as the system supports. Each time scheduling, the scheduler only needs to get the process in the head of the chain from the first non empty list. In this way, the efficiency of the scheduler is greatly improved, and the complexity is O (1);
In recent versions of Linux 2.6, executable processes are hung in a red black tree (think of it as a balanced binary tree) in order of priority. Every time, the scheduler needs to find the highest priority process from the tree. The complexity is O (logn).
So, why did the complexity of scheduler selection process increase from early Linux 2.6 to recent Linux 2.6?
This is because, at the same time, the scheduler’s implementation of fairness changes from the first idea mentioned above to the second idea (by dynamically adjusting the priority). The algorithm of O (1) is based on a small number of linked lists. According to my understanding, the range of priority value is very small (the degree of discrimination is very low), which can not meet the needs of fairness. The use of red black tree has no restriction on the value of priority (32 bits, 64 bits or more can be used to represent the value of priority), and the complexity of O (logn) is also very efficient.
Timing of scheduling trigger
The triggering of scheduling mainly includes the following situations:
1. The current process (the process running on the CPU) state becomes non executable.
The process executing system call becomes non executable actively. For example, execute nanosleep to sleep, execute exit to exit, and so on;
The resource requested by the process cannot be satisfied and is forced to sleep. For example, when the read system call is executed, there is no required data in the disk cache, so you sleep and wait for disk IO;
The process becomes non executable in response to a signal. For example, respond to sigstop to enter the pause state, respond to sigkill to exit, etc;
2. Preemption. When a process is running, it is deprived of the CPU unexpectedly. This can be divided into two situations: the process runs out of time slice, or a higher priority process appears.
Higher priority processes are awakened by processes running on the CPU. For example, sending a signal to wake up actively, or being awakened by releasing the mutex object (such as releasing the lock);
When the kernel responds to the clock interrupt, it finds that the time slice of the current process is used up;
In the process of responding to the interrupt, the kernel finds that the external resources waiting for the process with higher priority become available and wakes it up. For example, when the CPU receives the network card interrupt, the kernel processes the interrupt, finds a socket readable, and wakes up the process waiting to read the socket; another example is that the kernel triggers a timer in the process of processing the clock interrupt, so as to wake up the corresponding process sleeping in the nanosleep system call.
When all tasks adopt Linux time-sharing scheduling strategy:
1, the task is created, the time-sharing scheduling strategy is adopted, and the priority nice value (- 20 ~ 19) is specified.
2, the execution time on the CPU will be determined according to the nice value of each task.
3. If there is no waiting resource, the task is added to the ready queue.
4. The scheduler traverses the tasks in the ready queue, and selects the one with the largest calculation result to run by calculating the weight (counter + 20 NICE) of the dynamic priority of each task. When the time slice is used up (counter is reduced to 0) or the CPU is voluntarily abandoned, the task will be placed at the end of the ready queue (time slice is used up) or in the waiting queue (CPU is abandoned due to waiting for resources).
5. At this time, the scheduler repeats the above calculation process and goes to step 4.
6. When the scheduler finds that the weights calculated by all ready tasks are not greater than 0, repeat step 2.
When FIFO is used for all tasks:
1, FIFO is specified when creating process, and real-time priority is set_ priority ( one – ninety-nine )。
2. If there is no waiting resource, the task is added to the ready queue.
3. The scheduler traverses the ready queue and calculates the scheduling weight (1000 + RT) according to the real-time priority_ The FIFO task will occupy the CPU until a higher priority task is ready (even if the priority is the same) or gives up (waiting for resources).
4. When the scheduler finds that a higher priority task arrives (the higher priority task may be awakened by the interrupt or timer task, or by the currently running task, etc.), the scheduler immediately saves all the data of the current CPU registers in the current task stack, and reloads the register data from the stack of the higher priority task to the CPU. At this time, the higher priority task Start running. Repeat step 3.
5. If the current task voluntarily gives up the right to use CPU due to waiting for resources, the task will be deleted from the ready queue and join the waiting queue. At this time, repeat step 3.
When all tasks adopt RR scheduling strategy:
1. When creating a task, specify RR as the scheduling parameter, and set the real-time priority and nice value of the task (the nice value will be converted to the length of the time slice of the task).
2. If there is no waiting resource, the task is added to the ready queue.
3. The scheduler traverses the ready queue and calculates the scheduling weight (1000 + RT) according to the real-time priority_ Select the task with the highest weight to use CPU.
4. If the RR task time slice in the ready queue is 0, the time slice of the task will be set according to the nice value, and the task will be placed at the end of the ready queue. Repeat step 3.
5. If the current task actively exits the CPU due to waiting for resources, it will join the waiting queue. Repeat step 3.
There are time-sharing scheduling, time slice rotation scheduling and FIFO scheduling in the system
1. The processes scheduled by RR and FIFO are real-time processes, while the processes scheduled by time-sharing are non real-time processes.
2. When the real-time process is ready, if the current CPU is running the non real time process, the real-time process immediately preempts the non real time process.
3. Both RR process and FIFO process adopt real-time priority as the weight standard of scheduling. RR is an extension of FIFO. In FIFO, if the priority of two processes is the same, the specific execution of the two processes with the same priority is determined by their unknown status in the queue, which leads to some unfairness (the priority is the same, why should you run all the time?). If the scheduling policy of two tasks with the same priority is set to RR, Then the two tasks can be executed circularly and fairly.
INGO Molnar – real time patch
In order to integrate into the mainstream kernel, INGO Molnar’s real-time patch also adopts a very flexible strategy, which supports four preemption modes:
1. No forced preemption (server), which is equivalent to the standard kernel without preemption option, is mainly suitable for server environment such as scientific computing.
2．Voluntary Kernel Preemption ( Desktop ) This mode enables voluntary preemption, but still fails to preempt the kernel option. It reduces the preemption delay by adding preemption points, so it is suitable for some environments that need better responsiveness, such as desktop environment. Of course, this kind of good responsiveness is at the expense of some throughput.
3. Preemptable kernel (low latency desktop). This mode includes both voluntary preemption and preemptive kernel option, so it has good response delay. In fact, it has achieved soft real-time performance to a certain extent. It is mainly suitable for desktop and some embedded systems, but the throughput is lower than mode 2.
4．Complete Preemption ( Real – Time ) This mode enables all real-time functions, so it can fully meet the requirements of soft real-time. It is suitable for real-time systems with delay requirements of 100 microseconds or less.
The realization of real-time is at the cost of system throughput, so the better the real-time performance, the lower the system throughput.