Detailed explanation of kernel preemption mechanism in Linux system


1. Overview of kernel preemption

two . The new preemptive kernel refers to kernel preemption, that is, when a process is in kernel space and a higher priority task appears, if the current kernel allows preemption, the current task can be suspended and the higher priority process can be executed.

Before 2.5.4, the Linux kernel was not preemptive, and the high priority process could not stop the low priority process running in the kernel and preempt the CPU. Once a process is in the core state (for example, a user process performs a system call), it will continue to run until it finishes or exits the kernel unless it voluntarily abandons the CPU. On the contrary, a preemptive Linux kernel allows the Linux kernel to be preempted as user space. When a high priority process arrives, no matter whether the current process is in user mode or core mode, if preemption is allowed, Linux which can preempt the kernel will schedule the high priority process to run.

2. User preemption

When the kernel is about to return to the user space, if the need reset flag is set, schedule() will be called, and user preemption will occur. When the kernel returns to user space, it knows it’s safe. Therefore, whether the kernel returns from an interrupt handler or after a system call, it checks the need reset flag. If it is set, the kernel will choose another (more appropriate) process to run.

In short, user preemption occurs when:

Call back to user space from the system.

Returns user space from the interrupt handler.

3. Characteristics of non preemptive kernel

In a kernel that does not support kernel preemption, kernel code can be executed until it is completed. In other words, there is no way for the scheduler to reschedule a kernel level task while it is executing – the tasks in the kernel are scheduled in a cooperative way, and they are not preemptive. Of course, processes running in the kernel state can actively abandon the CPU. For example, in the system call service routine, the kernel code abandons the CPU due to waiting for resources. This situation is called planned process switch. The kernel code is executed until it is finished (returned to user space) or until it is obviously blocked,

In the case of single CPU, this setting greatly simplifies the synchronization and protection mechanism of the kernel. It can be analyzed in two steps

First of all, it does not consider the situation that the process voluntarily abandons the CPU in the kernel (that is, there is no process switching in the kernel). Once a process enters the kernel, it will run until it finishes or exits the kernel. Before it completes or exits the kernel, there will not be another process entering the kernel, that is, the execution of the process in the kernel is serial, and it is impossible to have multiple processes running in the kernel at the same time, so that the concurrent problem caused by multiple processes executing at the same time will not be considered in the kernel code design. Linux kernel developers don’t have to consider the problem of complex process concurrent execution and mutual exclusion to access critical resources. When a process accesses and modifies the data structure of the kernel, it does not need to lock to prevent multiple processes from entering the critical area at the same time. At this time, we only need to consider the interrupt situation. If there is an interrupt processing routine, it is possible to access the data structure that the process is accessing. Then the process only needs to turn off the interrupt before entering the critical area, and turn on the interrupt when leaving the critical area.

Consider the case where a process voluntarily abandons the CPU. Because the abandonment of CPU is voluntary and active, it means that the process switching in the kernel is known in advance, and it will not happen without knowing. In this way, we only need to consider the concurrency problem caused by the simultaneous execution of multiple processes in the place of process switching, and we do not need to consider the concurrent execution of processes in the whole kernel.

4. Why kernel preemption?

It is very important for Linux to realize the preemption of kernel. First, it’s necessary to apply Linux to real-time systems. The response time of real-time system is strictly limited. When a real-time process is awakened by the hardware interrupt of real-time device, it should be scheduled to execute within the limited time. But Linux can’t meet this requirement, because the kernel of Linux is not preemptive, and the residence time of the system in the kernel can’t be determined. In fact, when the kernel performs long system calls, real-time processes can’t be scheduled until the processes running in the kernel exit from the kernel. The resulting response delay can reach 100ms under today’s hardware conditions.

This is not acceptable for those systems that require high real-time response. Preemptive kernel is not only very important for real-time application of Linux, but also can solve the problem that Linux does not support low latency applications such as video and audio.

Due to the importance of preemptive kernel, preemptive kernel is incorporated into Linux 2.5.4, which is a standard optional configuration of kernel as SMP.

5. Under what circumstances is kernel preemption not allowed

There are several situations in which the Linux kernel should not be preempted. In addition, the Linux kernel can be preempted at any point. These cases are as follows:

The kernel is processing interrupts. In Linux kernel, process can’t preempt interrupt (interrupt can only be stopped and preempted by other interrupt, process can’t stop and preempt interrupt), and process scheduling is not allowed in interrupt routine. The process scheduling function schedule () will make a judgement about this, and if it is called in the interrupt, it will print error messages.

The kernel is processing the bottom half of the interrupt context. A soft interrupt is executed before the hardware interrupt returns, and is still in the interrupt context.

The code segment of the kernel is holding spinlock, writelock / readlock and other locks, which are in the protection state of these locks. The purpose of these locks in the kernel is to ensure the correctness of concurrent execution of processes running on different CPUs in SMP system in a short time. When holding these locks, the kernel should not be preempted, otherwise other CPUs will not be able to obtain locks for a long time due to preemption.

The kernel is executing the scheduler. The reason of preemption is for new scheduling, there is no reason to preempt the scheduler and then run the scheduler.

The kernel is operating on per CPU date structures. In SMP, the per CPU data structures are not protected by spinlocks because they are implicitly protected (different CPUs have different per CPU data, and processes running on other CPUs will not use the per CPU data of another CPU). However, if preemption is allowed, but a process is rescheduled after preemption, it may be scheduled to other CPUs. At this time, the per CPU variable defined will have problems, and preemption should be prohibited.

To ensure that the Linux kernel will not be preempted under the above conditions, preempt kernel uses a variable preempt_ Count, called kernel preemptive lock. This variable is set in the PCB structure task of the process_ In struct. Whenever the kernel wants to enter the above states, the variable preempt is used_ Count is increased by 1, indicating that the kernel does not allow preemption. Whenever the kernel exits from the above states, the variable preempt_ At the same time, preemptive judgment and scheduling are performed.

When returning to kernel space from interrupts, the kernel checks for needs_ Resched and preempt_ The value of count. If need_ If resched is set and preempt count is 0, it indicates that there may be a more important task to be executed and preempted safely. At this time, the scheduler will be called. If preempt count is not 0, then the kernel is in a state of no preemption and cannot be rescheduled. At this point, it will return to the current execution process directly from the interrupt as usual. If all the locks held by the current process are released, preempt_ Count will be 0 again. At this point, the code that releases the lock checks the need_ Whether resched is set. If so, the scheduler is called.

6. Kernel preemption

In the version 2.6 kernel, preemption is introduced into the kernel; now, as long as the rescheduling is safe, the kernel can preempt the executing task at any time.

So, when is rescheduling safe? As long as the premptcount is 0, the kernel can preempt. Generally, locks and interrupts are symbols of non preemptive areas. Since the kernel supports SMP, if the lock is not held, the executing code can be re exported, that is, preempted.

If a process in the kernel is blocked, or if it explicitly calls schedule (), kernel preemption also occurs explicitly. This form of kernel preemption has always been supported (in fact, the initiative to give up the CPU), because there is no need for additional logic to ensure that the kernel can be safely preempted. If the code explicitly calls schedule (), it should know that it can be preempted safely.

Kernel preemption may occur in:

Before the slave interrupt handler is executing and returns to kernel space.

When the kernel code is preemptive again, such as unlocking and enabling soft interrupt.

If a task in the kernel explicitly calls schedule ()

If a task in the kernel is blocked (this also causes a call to schedule ())

7. How to support kernel preemption

There are two main modifications of preemptive Linux kernel: one is to modify the entry code and return code of interrupt. The kernel preempts the lock at the entry of the interrupt_ Add 1 to count to prevent kernel preemption; at the return of interrupt, the kernel preemption lock preempt_ Count minus 1 makes it possible for the kernel to be preempted.

When we say that preemptive Linux kernel can be preempted at any point of the kernel, the main reason is that an interrupt can occur at any point. Whenever an interrupt occurs, the preemptive Linux kernel will make preemptive judgment when it processes the interrupt return. If the current state of the kernel is allowed to be preempted, the kernel will reschedule and select high priority processes to run. This is different from the non preemptive kernel. In the non preemptive Linux kernel, when returning from the hardware interrupt, only when the current interrupted process is a user state process, it will be rescheduled. If the current interrupted process is a core state process, it will not be scheduled, but will resume the interrupted process to continue running.

Another basic modification is the redefinition of spin lock, read lock and write lock, and the operation of preempt count variable is added in lock operation. When the locks are locked, the preemptcount variable is increased by 1 to prevent kernel preemption; when the locks are released, the preemptcount variable is decreased by 1, and preemption scheduling is performed when the preemption conditions of the kernel are met and the kernel needs to be rescheduled.

Another preemptive kernel implementation scheme is to insert preemption points into kernel code segments. In this scheme, we first find the code segment with growth delay in the kernel, and then insert a preemption point in the appropriate position of the code segment in the kernel, so that the system does not have to wait until the code is executed to reschedule. In this way, the system can schedule the service process to the CPU as soon as possible for the events that need quick response. Preemption point is actually a call to the process scheduling function. The code is as follows:

Copy code

The code is as follows:


Usually, such a code segment is a loop body, and the scheme of inserting preemption point is to continuously detect needs in this loop body_ If necessary, call schedule() to force the current process to abandon the CPU

8. When do I need to reschedule

The kernel must know when to call schedule(). If schedule () is explicitly called by user program code alone, they may be executed forever. Instead, the kernel provides a need_ The resched flag indicates whether a rescheduling is needed. When a process runs out of time, scheduler tick () will set this flag; when a process with higher priority enters the executable state, try_ to_ wake_ Up will also set this flag.

set_ tsk_ need_ Resched: sets the need in the specified process_ Resched flag

clear tsk need_ Resched: clears the need in the specified process_ Resched flag

need_ resched () : check need_ The value of the resched flag ; If it is set, it returns true, otherwise it returns false

Semaphore, waiting for queue, completion and other mechanisms are based on waitqueue, and the wake-up function of waitqueue is default_ wake_ Function, which calls try_ to_ wake_ Up changes the process to a runnable state and sets the pending dispatch flag.

The kernel also checks for needs when returning to user space and from interrupts_ Resched flag. If it has been set up, the kernel will invoke the scheduler before proceeding.

Each process contains a need_ This is because it is faster to access the value in the process descriptor than to access a global variable (because the current macro is very fast and the descriptor is usually in the cache). In kernel versions before 2.2, the flag used to be a global variable. In kernel 2.2 to 2.4, it is in task_ In struct. In version 2.6, it was moved to thread_ In the info structure, it is represented by a bit in a special flag variable. It can be seen that kernel developers are always improving.

9. Avoid kernel preemption
Once a process calls schedule, if it is scheduled to run again, there are several possibilities: 1_ Running, if it is in the running queue, it will definitely have a chance to run again; 2. If it is in the sleep queue, it will run once the condition is met and it is awakened. So if a process is preempted and it is not in the running queue, how can it run again? The answer is that it doesn’t work. In order to avoid this situation, we must avoid being in non task_ The running process and the preempted process will not be driven out of the running queue, that is, the following code, schedule code:

Copy code

The code is as follows:

if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {</p>
<p>switch_count = &prev->nvcsw;</p>
<p>if (unlikely((prev->state & TASK_INTERRUPTIBLE) && unlikely(signal_pending(prev))))</p>
<p>prev->state = TASK_RUNNING;</p>
<p>else {</p>
<p>if (prev->state == TASK_UNINTERRUPTIBLE)</p>
<p>deactivate_task(prev, rq);</p>

Maybe some people will ask, how can there be a task_ Running processes are preempted. It’s hard to answer this question, but remember that the state of a process has nothing to do with its queue. There is always a gap between setting process state and preemption. Let’s look at the following code:

Copy code

The code is as follows:

for (;;) { \</p>
<p>1: prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \</p>
<p>2: if (condition) \</p>
<p>3: break; \</p>
<p>4: schedule(); \</p>

If it is preempted in 1, it will be task after setting the process_ When uniterruptible, it was preempted. Originally, it was about to test whether the conditions were met. As a result, it was added to the sleep queue to go to sleep. If there was no preempt_ Active, then it will be removed from the running queue in the schedule. If you only have this opportunity to wake up, then you will never wake up the process. If the conditions for returning from the schedule are not met, then it will be removed from the running queue in the following schedule. This is not a preemptive responsibility. If you have to do something, you will make an error in the dequeue_ The array > queue in the task is empty. When you are out of the queue for the second time, there will be an error due to the null pointer reference (in fact, this will not happen, because as long as you come back from schedue, the status of the process must be task_ Running, just an example). Therefore, it must be ensured that when the process is removed from the running queue, it must be in the running queue, otherwise it will be a bird! In fact, preempt_ The function of active is to prevent users from being in non task state_ The process in running state does not move out of the running queue in any sleep queue. In short, it must be ensured that the process is in a queue or can be awakened. The preempted process cannot be awakened. If it is not in the running queue, it will never run again. So preempt_ How does active ensure that the preempted process will not be removed from the running queue? It’s in preempt_ The implementation of schedule is as follows:

Copy code

The code is as follows:

asmlinkage void __ sched preempt_ schedule(void)</p>

In addition to this, if you want to preempt from interrupt to kernel space in earlier kernels, you will also add this preempt in enterprise. S_ ACTIVE。 Now there is another question: why wait_ How to implement event? Why do we need a cycle? My answer is: in this case, the reason why a process can be woken up is that it joins a sleep queue. As you said, it’s not safe to judge the condition directly after the schedule, because it’s not necessarily because the condition is met. If one or two processes are woken up at the same time, it’s very likely that one process’s condition can’t be met. If the process is robbed at this time Zhan, then the process has no chance to join the sleep queue, and no chance to be awakened, although preempt_ Active ensures that the process does not leave the running queue, but it loses the original intention of the program. The original intention of the program is to wake up the running queue to make the process run. At this time, it is completely based on the priority. Even if the condition is met, because the process is not in the sleep queue, it will not be woken up, and the system will be in a mess.

In fact, it’s very simple. You must judge the condition after adding the process to the sleep queue, because you can not miss the wake-up notification. If the other process wakes up the sleep queue before joining, the process will miss the wake-up. The reason is that there may be more than one cycle When a process is awakened, there will be competition. This cycle is set for competition. This cycle ensures that every process out of this cycle can safely carry the condition that the result is true.

In addition, when it comes to task_ In the running state, someone asked why the process state should be set to task in the missing page_ Running, is it not task before missing page_ Running? In most cases, it should be, but the Linux kernel can’t guarantee it. The reason is in the handle_ mm_ Fault sets the process state to task_ Running is to ensure that the process can be woken up if it sleeps during page missing processing. For example, in select, when the process is set to non task_ After running, there will be copies_ from_ User, which may cause page missing. If the process state is not set to task_ Running, in case of schedule in page fault, the process will be driven out of the running queue and will never come back. In order to prevent this, the measures are: distinguish the status in any place where schedule is called, and then set the process status, such as using preempt_ Active to prevent, another is like handle_ mm_ As in fault, try to make the process in task_ Enter schedule in runnabe state. But I don’t think this should be removed now. Even if the process is not set to running state in the missing page, if it has to be scheduled, it is also set to the runtime before.

ACTIVE_ Function of preempt: prevent the process that is not in running state from being preempted before it joins the sleep queue, and then eliminate it from the running queue. This will never come back, although this situation is rare, generally put the process to sleep queue and then set the state.

Recommended Today

Quickly use the latest 15 common APIs of vue3

Before that, I wrote a blog to introduce the new features of vue3. I had a brief understanding of the features of vue3, and at the end of the article, I gave you a little experience in vue3Compsition APISimple use of Address of last article: follow Youda’s steps and experience the new features of vue3 […]