The love hate entanglement between Linux system process and thread

Time:2020-10-24

The love hate entanglement between Linux system process and thread
Original address: http://embed.21ic.com/softwar…

When a program begins to execute, its part in memory is called a process during the period from the beginning of execution to the end of execution.

Linux is a multitasking operating system, that is, at the same time, multiple processes can be executed at the same time. In fact, the single CPU computer we commonly use can only execute one instruction in a time segment. So how does Linux implement multi process execution at the same time? It turns out that Linux uses a method called “process scheduling” First of all, assign a certain running time to each process, which is usually very short, as short as milliseconds. Then, according to certain rules, select one of the numerous processes to put into operation, and the other processes will wait temporarily. When the running process runs out of time, or exits after execution, or stops for some reason, Linux will reschedule Select a process to put into operation, because each process takes up a short time segment. From the perspective of the user, it is like multiple processes running at the same time.

In Linux, each process is assigned a data structure called process control block (PCB) when it is created. PCB contains a lot of important information for system scheduling and process execution. The most important one is process ID, which is also called process identifier. It is a non negative integer, which is the only sign of a process in Linux operating system. On the most commonly used I386 architecture, the value of a non negative integer is 0~32767, which is also the process ID that we may take, which is the ID number of the process.

The generation of zombie process

A zombie process is a process that has ended but has not been removed from the process table. If there are too many zombie processes, the entries in the process table will be full, and then the system will crash, but it will not occupy the system resources.

In the process state, the zombie process is a very special one. It has given up almost all the memory space, has no executable code, and can not be scheduled. It only keeps a position in the process list to record the exit status and other information of the process for other processes to collect. In addition, the zombie process no longer occupies any memory space. It needs it If the parent process does not install sigchld signal processing function, call wait or waitpid() to wait for the child process to finish, and ignore the signal if it is not displayed, then it will be in zombie state all the time. If the parent process ends, the init process will automatically take over the child process and collect the corpse for it. It can still be cleared. But if the parent process is a loop and does not end, then the child process is always in a zombie state.

Causes of zombie process:

Each Linux process has an entry point in the process table, and all the information used by the core program when executing the process is stored in the entry point. When you use the PS command to view the process information in the system, you will see the relevant data in the process table. When the fork system call establishes a new process, the core process will assign an entry point to the new process in the process table, and then store the relevant information in the process table corresponding to the entry point. One of these information is the identification code of the parent process. When the process has completed its life cycle, it will execute the exit() system call. At this time, the data in the original process table will be replaced by the exit code of the process, the CPU time used during execution, and other data, which will be retained until the system passes it to its parent process. It can be seen that the zombie process appears after the termination of the subprogram, but before the parent process has read the data.

How to avoid zombie process

1. The parent process waits for the child process to finish through functions such as wait and waitpid, which causes the parent process to hang

2, if the parent process is busy, you can install handler with signal function for SIGCHLD, because the parent process will receive the signal after the child process is finished, and wait recovery can be invoked in handler.

3. If the parent process doesn’t care when the child process ends, you can use “singal (sigchld), SIG_ Ign “informs the kernel that it is not interested in the end of the child process. After the child process is finished, the kernel will recycle it and no longer send signals to the parent process.

4. There are also some skills, that is, fork () twice, the parent process forks a child process, and then continues to work. The child process forks a grandson process and exits. Then the grandson process is taken over by init. After the grandson process is finished, init will recycle it, but the child process has to do it by itself.

Process PK thread

Let’s make an example. Multi thread is the intersection, multi thread is the plane traffic system, the cost is low, but there are many traffic lights and traffic jams, while the multi process is the overpass. Although the cost is high and the uphill and downhill use more fuel, there is no traffic jam. This is an abstract concept. I believe you will have this feeling after reading.

Process and thread are two relative concepts. Generally speaking, a process can define an instance of a program. In Win32, a process does not execute anything; it just occupies the address space used by the application. In order for a process to complete a certain work, the process must occupy at least one thread, which is responsible for containing the code in the process address space. In fact, a process can contain several threads that can execute code in the process address space at the same time. To do this, each thread has its own set of CPU registers and stacks. At least one thread in each process is executing code in its address space. If no thread executes the code in the process address space, there is no reason for the process to exist. The system will automatically clear the process and its address space.

The principle of multithreading

When a process is created, its first thread is called the primary thread, which is automatically generated by the system. This main thread can then generate additional threads, and these threads can generate more threads. When running a multithreaded program, on the surface, these threads seem to be running at the same time. This is not the case. In order to run all these threads, the operating system allocates some CPU time for each individual thread. The single CPU operating system provides the thread with quantum in the mode of time slice rotation. Each thread hands over the control after using the time slice, and then the system allocates the CPU time slice to the next thread. Because each time slice is short enough, it gives the impression that these threads are running at the same time. The only purpose of creating additional threads is to use CPU time as much as possible.

The problem of multithreading

Using multithreading programming can bring a lot of flexibility to programmers, but also make it easier to solve problems that need complex skills. However, we should not artificially divide the written program into fragments, and let these fragments execute according to their own threads. This is not the right way to develop applications. Threads are useful, but when you use threads, you can solve old problems while creating new ones. For example, to develop a word processing program, and want the printing function to execute itself as a separate thread. This sounds like a good idea, because when you print, the user can go back and start editing the document. But in this way, when the document is printed, the data in the document may be modified, and the printing result is no longer the expected content. Maybe it’s better not to put the printing function in a separate thread, but if you must use multi threading, you can also consider using the following methods: the first method is to lock the document being printed and let the user edit other documents, so that the document will not be modified before the end of printing; the other method may be more effective, that is, the document can be put into use Copy to a temporary file, print the contents of the temporary file, and allow users to modify the original document. When the temporary file containing the document is printed, delete the temporary file. From the above analysis, it can be seen that multithreading can help solve problems and also bring new problems. Therefore, it is necessary to find out when multithreading needs to be created and when multithreading is not required. Generally speaking, multithreading is often used in the case of background calculation or logical judgment while operating in the foreground.

Classification of threads

In MFC, threads are divided into two categories: worker threads and user interface threads. If a thread only completes the background calculation and does not need to interact with the user, then the worker thread can be used; if a thread needs to be created to process the user interface, the user interface thread should be used. The main difference between the two is that the MFC framework will add a message loop to the user interface thread, so that the user interface thread can process the messages in its own message queue. In this way, if you need to do some simple calculations in the background (such as recalculation of spreadsheets), you should first consider using worker threads Background threads need to deal with more complex tasks. Specifically, when the execution process of background threads will change with the actual situation, user interface threads should be used to respond to different messages.

thread priority

When the system needs to execute multiple processes or multiple threads at the same time, it sometimes needs to specify the priority of the threads. The priority of a thread generally refers to the base priority of the thread, that is, the combination of the relative priority of the thread relative to the process and the priority of the process containing the thread. The operating system arranges all active threads based on priority. Each thread in the system is assigned a priority, and the priority range is from 0 to 31. At runtime, the system simply allocates CPU time to the first thread with priority 31, and after the thread’s time slice ends, the system allocates CPU time to the next thread with priority 31. When there is no thread with priority 31, the system will start to allocate CPU time to the thread with priority 30, and so on. In addition to the programmer changing the priority of the thread in the program, sometimes the system will automatically change the priority of the thread during the execution of the program, which is to ensure that the system is highly responsive to the end user. For example, when a user presses a key on the keyboard, the system will temporarily process WM_ The priority of the thread for Keydown messages is increased by 2 to 3. The CPU executes the thread according to a complete time slice. When the time slice is finished, the system will reduce the priority of the thread by 1.

Synchronization of threads

In the use of multithreading programming, there is also a very important issue is thread synchronization. The so-called thread synchronization refers to the ability of threads to avoid destroying their own data when communicating with each other. The synchronization problem is caused by the CPU time slice allocation method of the mentioned Win32 system. Although at a certain time, only one thread occupies CPU time (single CPU time), but there is no way to know when and where threads are interrupted, so how to ensure that threads do not destroy each other’s data is particularly important. In MFC, four synchronization objects can be used to ensure the simultaneous running of multiple threads. They are critical section object (ccrticalsection), mutex object (cmutex), semaphore object (csemaphore) and event object (cevent). Among these objects, the critical section object is the easiest to use. Its disadvantage is that it can only synchronize threads in the same process. In addition, there is a basic method, which is called linearization method in this paper, that is to say, the writing operation of certain data is completed in one thread during the programming process. In this way, since the code in the same thread is always executed in sequence, it is impossible to rewrite the data at the same time.

Summary:

In the thread (relative to the process), the thread is a concept closer to the executor. It can share data with other threads in the same process, but it has its own stack space and independent execution sequence. Both of them can improve the concurrency of program, improve the efficiency of program running and response time. Thread and process have their own advantages and disadvantages in use: thread execution cost is small, but it is not conducive to resource management and protection; process is the opposite. The fundamental difference is: with multiple processes, each process has its own address space, while threads share the address space. In terms of speed, threads generate faster, communicate faster, switch faster, etc., because they are in the same address space. In terms of resource utilization: the resource ratio of threads is better because they are in the same address space. In terms of synchronization: when threads use common variables / memory, they need to use synchronization mechanism because they are in the same address space: the child process is the replica of the parent process, and the child process obtains the replica of the parent process’s data space, heap and stack.