On the process stack, thread stack, kernel stack and interrupt stack in Linux


What is a stack? What does a stack do?

First, a stack is a concatenated formdata structure。 This data structure is characterized byLast in first out(LIFO, last in first out), data can only be carried out at one end of the serial column (called: top of stack)push into(push) andeject(POP) operation. According to the characteristics of the stack, it is easy to think of the use of arrays to achieve this data structure. But this paper is not about the software stack, but the hardware stack.

On the process stack, thread stack, kernel stack and interrupt stack in Linux

Most processor architectures have implementation hardware stacks. There are special stack pointer registers and specific hardware instructions to complete the stack in / out operation. For example, in the arm architecture, R13 (SP) pointer is the stack pointer register, while push is the assembly instruction for stack pressing, and pop is the assembly instruction for stack out.

Let’s see what the stack does. The function of stack can be reflected in two aspectsfunction callandMultitasking support

1、 Function call

We know that a function call has the following three basic processes:
-Call parameter passing in
-Spatial management of local variables
-Function return

Function calls must be efficient, and the data is stored in theCPU general registerperhapsRAM memoryChina is undoubtedly the best choice. In the case of passing call parameters, we can choose to use CPU general registers to store parameters. However, the number of general registers is limited. When function nested calls occur, the sub function will inevitably lead to conflict when using the original general register again. Therefore, if you want to use it to pass parameters, you must first call a child functionSave the value of the original registerAnd then when the child function exitsRestore the value of the original register

The number of function call parameters is generally relatively small, so the general register can meet certain requirements. However, the number and space of local variables are relatively large, and it is difficult to rely on Limited general registers. Therefore, we can use some RAM memory areas to store local variables. But where is the storage appropriate? We can not let the function nested calls have conflicts, but also pay attention to efficiency.

In this case, the stack undoubtedly provides a good solution. 1、 For the conflict of general register parameter transfer, we can temporarily press the general register into the stack before calling the sub function; after the sub function is called, the saved register will be ejected and recovered. 2、 For the function that needs to move down the top of the stack, you can only move the pointer back to the top of the stack. For the function that needs to move down the top of the stack, you can only return the pointer back to the top of the stack after the function is called Pointer, that is to complete the return of the function call;

Therefore, the three basic processes of the above function call evolve to record the process of a stack pointer. Each time a function is called, a stack pointer is provided. Even if the function is nested in a loop, as long as the corresponding function stack pointer is different, there will be no conflict.

On the process stack, thread stack, kernel stack and interrupt stack in Linux

C / C + + Linux server architect is required to learn material plus group (563998835) (materials include C / C + +, Linux, golang technology, nginx, zeromq, mysql, redis, fastdfs, mongodb, ZK, streaming media, CDN, P2P, k8s, docker, TCP / IP, coprocess, dpdk, ffmpeg, etc.) for free sharing

On the process stack, thread stack, kernel stack and interrupt stack in Linux

2、 Multitasking support

However, the significance of stack is not only function call, but also multi task mode of operating system can be constructed with its existence. We take the main function call as an example. The main function contains an infinite loop body. In the loop body, function a is called first, and then function B is called.

func B():  return;func A():  B();func main():  while (1)    A();

Imagine that in the case of a single processor, the program will always stay in this main function. Even if there is another task waiting, the program cannot jump from the main function to another task. Because if it is a function call relationship, it is still a task belonging to the main function in essence and cannot be considered as multi task switching.At the moment, the main function task itself is actually bound to its stack. No matter how nested the calling function is, the stack pointer moves within the scope of the stack.

It can be seen that a task can be characterized by the following information:
1. Main function body code
2. Main function stack pointer
3. Current CPU register information

If we can save the above information, we can force the CPU to handle other tasks. As long as you want to continue to perform this main task in the future, you can restore the above information. With such preconditions, multitasking has the foundation of existence, and another meaning of stack can be seen.In multitask mode, when the scheduler thinks it is necessary to switch tasks, it only needs to save the task information (i.e., the three contents mentioned above). The last running status of the task can be resumed.

It can be seen that each task has its own stack space. It is because of the independent stack space. For code reuse, different tasks can even mix the function body of the task. For example, one main function can have two task instances. So far, the framework of the operating system has also been formed. For example, when a task calls sleep() to wait, it can actively give up the CPU to other tasks, or the time-sharing operating system task will be forced to give up the CPU when the time slice is used up. No matter which method, as long as you try to switch the context space of the task, you can switch the stack.

On the process stack, thread stack, kernel stack and interrupt stack in Linux

How many stacks are there in Linux? Memory locations of various stacks?

The kernel divides the stack into four types:

  • Process stack
  • Thread stack
  • Kernel stack
  • Interrupt stack

1、 Process stack

The process stack belongs to user state stack and processVirtual address spaceClosely related. Let’s first understand what virtual address space is: in a 32-bit machine, the virtual address space is 4G. These virtual addresses are mapped to physical memory through a page table, which is maintained by the operating system and referenced by the processor’s memory management unit (MMU) hardware.Each process has its own set of page tables, so each process seems to have its own virtual address space.

The Linux kernel divides the 4G byte space into two parts, and uses the highest 1g byte (0xc0000000-0xffffffff) for the kernel, which is calledKernel space。 The lower 3G bytes (0x00000000-0xbfffffff) are used by each process, which is calledUser space。 Each process can fall into kernel state through system call, so kernel space is shared by all processes. Although the kernel and user mode processes occupy such a large address space, it does not mean that they use so much physical memory, only that they can control such a large address space. They are used to map physical memory to virtual address space as needed.

On the process stack, thread stack, kernel stack and interrupt stack in Linux

Linux has a standard layout for the process address space. The address space is composed of different memory segments. The main memory segments are as follows:
-Text segment: memory mapping of executable code
-Data segment: memory mapping of initialized global variables for executable files
-BSS segment: uninitialized global or static variables (initialized with zero page)
-Heap: storage, dynamic memory allocation, anonymous memory mapping
-Stack: the process user space stack, which is automatically allocated and released by the compiler to store the parameter values and local variable values of functions
-Memory mapping segment: any memory mapping file

On the process stack, thread stack, kernel stack and interrupt stack in Linux

In the stack, we refer to the process space。 The initialization size of the process stack is calculated by the compiler and linker, but the real-time size of the stack is not fixed. The Linux kernel will dynamically grow the stack area according to the stack situation (in fact, adding new page tables). But it doesn’t mean that the stack area can grow infinitely. It also has the maximum limit, rlimit_ Stack (generally 8m), we can view or change rlimit through ulimit_ Value of stack.

How to confirm the size of process stack

If we want to know the size of the stack, we have to know the starting and ending addresses of the stack.Stack start addressIt is very simple to obtain the stack pointer ESP address by embedding assembly instructions.Stack end addressIt’s a bit troublesome to get the stack. We need to use recursive function to overflow the stack, and then print the stack pointer esp when the stack overflows in GDB. The code is as follows:

/* file name: stacksize.c */void *orig_stack_pointer;void blow_stack() {    blow_stack();}int main() {    __asm__("movl %esp, orig_stack_pointer");    blow_stack();    return 0;}
$ g++ -g stacksize.c -o ./stacksize$ gdb ./stacksize(gdb) rStarting program: /home/home/misc-code/setrlimitProgram received signal SIGSEGV, Segmentation fault.blow_stack () at setrlimit.c:44       blow_stack();(gdb) print (void *)$esp$1 = (void *) 0xffffffffff7ff000(gdb) print (void *)orig_stack_pointer$2 = (void *) 0xffffc800(gdb) print 0xffffc800-0xff7ff000$3 = 8378368    // Current Process Stack Size is 8M

There is a more global introduction to the process address space above. Let’s take a look at how the above memory layout is reflected in the Linux kernel. The kernel uses a memory descriptor to represent the address space of a process, which represents all the address space information of the process. The memory descriptor is composed of mm_ Struct structure indicates that the following is the description of each domain in the memory descriptor structure. Please look at it in combination with the previous process memory segment layout:

struct mm_ struct {    struct vm_ area_ Struct * MMAP; / * memory area linked list * / struct RB_ root mm_ RB; / * VMA formed red black tree * /... Struct list_ Head mmlist; / * all MM_ Linked list formed by struct * /... Unsigned long total_ VM; / * total number of pages * / signed long locked_ VM; / * locked page data * / unsigned long pinned_ vm;               /* Refcount permanently increased */    unsigned long shared_ VM; / * number of shared pages (files) * / unsigned long Exec_ VM; / * number of executable pages VM_ EXEC & ~VM_ WRITE */    unsigned long stack_ VM; / * number of pages in stack area VM_ GROWSUP/DOWN */    unsigned long def_ flags;    unsigned long start_ code, end_ code, start_ data, end_ Data; / * code segment, start address and end address of data segment * / unsigned long start_ brk, brk, start_ Stack; / * start address of stack area, start address and end address of heap area * / unsigned long ARG_ start, arg_ end, env_ start, env_ End; / * start and end addresses of command line parameters and environment variables * /... / * architecture specific mm context * / mm_ context_ T context; / * architecture specific data * / * must use atomic bitops to access the bits * / unsigned long flags; / * status flags * /... / * coremumping and NuMA and hugepage related structures * /};

On the process stack, thread stack, kernel stack and interrupt stack in Linux

[extended reading]: realization of dynamic growth of process stack

In the process of running, the process continuously pushes data into the stack area. When the stack area capacity is exceeded, the memory area corresponding to the stack will be exhausted, which will trigger aPage fault。 After falling into kernel state through an exception, the exception will be expanded by the kernel_ The stack() function is used to process, and then acct is called_ stack_ Growth () to check if there is still a suitable place for stack growth.

If the stack size is lower than limit_ Stack (usually 8MB), in general, the stack will be lengthened, and the program will continue to execute without feeling what happened. This is a normal mechanism to expand the stack to the required size. However, if the size of the maximum stack space is reached, this will happenStack overflowThe process will receive theSegmentation faultSignal.

Dynamic stack growth is the only access to an unmapped memory area that is allowed. Any other access to the unmapped memory area will trigger a page fault, resulting in a segment error. Some of the mapped areas are read-only, so attempting to write to them can also result in a segment error.

2、 Thread stack

From the perspective of Linux kernel, it has no concept of thread. Linux implements all threads as processes, which unifies threads and processes into task_ In struct. A thread is only regarded as a process that shares some resources with other processes, and whether to share address space is almost the only difference between a process and the so-called thread in Linux. When the thread is created, clone is added_ VM tag, soThe memory descriptor of the thread points directly to the memory descriptor of the parent process

if (clone_ flags & CLONE_ VM) {/ * * current is the parent process, while TSK is a shared child process during fork () * / atomic_ inc(¤t->mm->mm_ users);    tsk->mm = current->mm;  }

Although the address space of a thread is the same as that of a process, the stack of its address space is different. For a Linux process or main thread, its stack is generated at fork time. In fact, it copies the father’s stack space address, then copies on write (cow) and grows dynamically. However, for the child thread generated by the main thread, its stack will no longer be like this, but fixed in advance, using MMAP system call, it does not have VM_ STACK_ Flags flag. This can be done from the allocate in glibc’s NPTL / allocatestack. C_ In the stack() function, you can see that:

mem = mmap (NULL, size, prot,            MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

Due to the mm – > start of the thread_ The stack stack address is the same as the process it belongs to, so the starting address of the thread stack is not stored in the task_ In struct, pthread should be used_ attr_ T to initialize the task_ Structure > thread > sp (SP points to struct Pt_ Regs object, which is used to save the register field of user process or thread). It doesn’t matter. The important thing is,The thread stack can’t grow dynamically. Once it is used up, it will be gone. This is different from the fork that generates the process。 Since the thread stack is a memory area from the address space of the process, it is private to the thread in principle. However, when all threads of the same process are generated, the task of the generator is copied shallowly_ Many fields of struct, including all VMA, can be accessed by other threads if you like, so you must pay attention to it.

3、 Process kernel stack

Calls to the kernel are bound to fall into each process of the system’s life cycle. After the execution of system calls falls into the kernel, the stack used by these kernel codes is not the stack in the original process user space, but a stack in a separate kernel space, which is called the process kernel stack. When a process is created, the process kernel stack uses the slab allocator from thread_ info_ Cache is allocated from the cache pool, and its size is thread_ Size, generally speaking, is a page size of 4K;

union thread_union {                                           struct thread_info thread_info;                        unsigned long stack[THREAD_SIZE/sizeof(long)];};                                                  

thread_ Kernel stack and task of union process_ Struct process descriptors are closely related. Because the kernel often needs to access task_ Struct, it is very important to get the descriptors of the current process efficiently. Therefore, the kernel uses a space in the head of the process kernel stack to store the thread_ Info structure, in which the descriptor of the corresponding process is recorded. The relationship between the two is shown in the following figure (the corresponding kernel function is DUP)_ task_ struct()):

On the process stack, thread stack, kernel stack and interrupt stack in Linux

With the above association structure, the kernel can first get the stack top pointer ESP, and then get the thread through the esp_ info。 Here is a little trick, directly connect the address of ESP with that of ~ (thread)_ After size – 1), the thread can be obtained directly_ The address of info. Because of thread_ The union structure is derived from thread_ info_ It is applied from the slab cache pool of cache, and thread_ info_ Cache in kmem_ cache_ When creating, ensure that the address is thread_ Size aligned. Therefore, you only need to thread the stack pointer_ Size alignment to get thread_ The address of the Union will get the thread_ The address of the union. Successfully obtained thread_ After info, directly take out its task member and get the task successfully_ struct。 In fact, the above description, that iscurrentMacro implementation method:

register unsigned long current_stack_pointer asm ("sp");static inline struct thread_info *current_thread_info(void)  {                                                                    return (struct thread_info *)                                        (current_stack_pointer & ~(THREAD_SIZE - 1));}                                                            #define get_current() (current_thread_info()->task)#define current get_current()                       

4、 Interrupt stack

When a process falls into kernel state, it needs the kernel stack to support kernel function calls. The same is true for interrupts. When the system receives interrupt events and processes interrupts, it also needs interrupt stacks to support function calls. Since the system is in kernel state when the system is interrupted, the interrupt stack can be shared with the kernel stack. But whether it is shared or not is closely related to the specific processing architecture.

The interrupt stack on X86 is independent of the kernel stack, and the allocation of memory space where the independent interrupt stack is located occurs in arch / x86 / kernel / IRQ_ 32. C’s IRQ_ ctx_ In the init() function (if it is a multiprocessor system, each processor will have an independent interrupt stack)__ alloc_ Pages are allocated in the low memory area2 physical pages, which is 8 KB of space. Interestingly, this function also allocates a separate stack of the same size for softirq. In this way, softirq will not be executed on hardirq’s interrupt stack, but in its own context.

On the process stack, thread stack, kernel stack and interrupt stack in Linux

The interrupt stack and kernel stack on arm are shared. There is a negative factor in the sharing of interrupt stack and kernel stack. If the interrupt is nested, it may cause stack overflow, which may damage some important data of the kernel stack. Therefore, the stack space is hard to avoid sometimes.

Why does Linux need to distinguish these stacks?

Why we need to distinguish these stacks is actually a design problem. Here are some of the views I have seen for your discussion

  1. Why do you need a separate process kernel stack? When all processes are running, they may fall into kernel state and continue to execute through system calls. Suppose that when the first process a is executed in kernel mode, it needs to wait for reading the data of the network card, and actively calls schedule () to give up the CPU; at this time, the scheduler wakes up another process B, and it happens that process B also needs system calls to enter the kernel state. That’s the problem. If there is only one kernel stack, the stack pressing operation generated when process B enters the kernel state will inevitably destroy the existing kernel stack data of process a. once the kernel stack data of process a is damaged, it is likely that the kernel state of process a cannot return to the corresponding user mode correctly;
  2. Why a separate thread stack is needed? There is no distinction between thread and process in Linux scheduler. When the scheduler needs to wake up the “process”, it is necessary to restore the process context, that is, the process stack. However, the thread and the parent process share the same address space. If the same stack is used, the following problems will be encountered. If the initial value of the stack pointer of the process is 0x7ffc80000000, and the parent process a executes first and calls some functions, then the stack pointer ESP is 0x7ffc8000ff00, then the parent process actively sleeps. Then the scheduler wakes up the child thread A1:
    At this time, if the stack pointer esp of A1 is the initial value of 0x7ffc80000000, thread A1 will inevitably destroy the data that has been put into the stack by the parent process a once a function call occurs. If the stack pointer of thread A1 is consistent with the last updated value of the parent process, and ESP is 0x7ffc8000ff00, after thread A1 makes some function calls, the stack pointer ESP will increase to 0x7ffc8000ffff, and then thread A1 will sleep. If the scheduler changes to parent process a for execution again, should the stack pointer of the parent process be 0x7ffc8000ff00 or 0x7ffc8000ffff? No matter what value the stack pointer is set to, there’s always a problem, isn’t it?
  3. Do processes and threads share a kernel stack? No, DUP is called when the thread and process are created_ task_ Struct to create task related structures, and the kernel stack is also alloc in this function_ thread_ info_ Node came out. Therefore, although threads and processes share an address space mm_ Struct, but does not share a kernel stack.
  4. Why do I need to interrupt the stack separately? This problem is not true. Arm architecture does not have an independent interrupt stack.