Operating system — deep understanding of process and thread

Time:2021-2-26

1. Definition of process and thread

Process: is a program in execution, a program is loaded into memory and ready to execute, it is a process, is a basic unit of system resource allocation and scheduling.

Thread: is an entity of a process, is the basic unit of CPU scheduling and dispatching, it is smaller than the process can run independently of the basic unit, sometimes known as lightweight process.

2. The difference between process and thread

  1. The same process can contain multiple threads, a process contains at least one thread, a thread can only exist in one process. That is, the thread must rely on the process.

  2. Each thread in the same process is not independent of each other and needs to share the resources of the process. Each process is basically independent and does not interfere with each other.

  3. Thread is a lightweight process, which needs much less time and resources to create and destroy than process

  4. In the operating system, the process is an independent unit with system resource allocation and scheduling, it can have its own resources. Generally speaking, a thread cannot own its own resources, but it can access the resources of its subordinate processes. A thread is the basic unit of CPU dispatch and scheduling

3. Communication mode between processes

3.1 overview

As I said before, processes are independent of each other, but sometimes in order to complete a set of tasks together, each process needs to cooperate with each other, which requires data transmission or resource sharing between each process, so it is necessary toIPC interprocess communication

3.2 purpose

  1. Data transmission: each process needs to exchange data
  2. Shared data: each process needs to operate shared data. One process should modify it, and other processes should see it immediately
  3. Notification event: a process needs to send a message to another process or a group of processes to inform them of certain events (such as notifying the parent process when the process terminates).
  4. Process control: some processes want to completely control the execution of another process (such as debug process). At this time, the control process wants to be able to intercept all traps and exceptions of another process and know its state change in time.

3.3 mode – Seven

1. Pipeline / anonymous pipeline

Operating system -- deep understanding of process and thread

characteristic

  • Half duplex
  • genetic relationship
  • It forms a file system and only exists in memory.
  • Queue form read write (FIFO)

Disadvantages (in terms of characteristics!)

  • Half duplex
  • genetic relationship
  • In memory, limited size
  • No format byte stream, which requires the reader and writer of the pipeline to agree on the format of the data in advance, such as how many bytes count as a message (or command, or record)

2. Famous pipeline

Oh, it seems to add a name (to be exact, a path name is associated with it)! This is the main difference, so that different processes can communicate with each other even if they are not related! Everything else is the same ~ emphasize itThe name of the named pipeline exists in the file system, and the content is stored in memory.

3. Signal

Is one of the oldest methods of interprocess communication used in UNIX system. The operating system notifies the process by signal that a predetermined event (one of a group of events) has occurred in the system. It is also an original mechanism of communication and synchronization between user processes.

Operating system -- deep understanding of process and thread

Life cycle of signal

Common signals in Linux system:

(1)SIGHUP:When the user logs off from the terminal, all started processes will receive the process. By default, the processing of this signal is to terminate the process.

(2)SIGINT:Program termination signal. When the program is running, pressCtrl+CKey will generate the signal.

(3)SIGQUIT:Program exit signal. When the program is running, pressCtrl+\\Key will generate the signal.

(4)Sigbus and SIGSEGV:Process access illegal address.

(5)SIGFPE:Fatal errors, such as divide by zero operation, data overflow and so on, occur in the operation.

(6)SIGKILL:The user terminates the process execution signal. Execute under shellkill -9Send the signal.

(7)SIGTERM:End process signal. Execute under shellKill process PIDSend the signal.

(8)SIGALRM:Timer signal.

(9)SIGCLD:Subprocess exit signal. If the parent process does not ignore the signal or process the signal, the child process will form a zombie process after exiting.

A process can assign another action or action to any signalheavy loadDefault action. The specified new action can be an ignore signal. A process can also temporarily block a signal. Therefore, the process can select the specific operations to be taken for a certain signal, including:

  • Ignore signal: the process can ignore the generated signal, butSigkill and sigstopSignals cannot be ignored and must be processed (either by the process itself or by thekernelTreatment). Processes can ignore most of the signals generated by the system.
  • Blocking signal: the process can choose to block some signals, that is, to record some signals first, and then process them later.
  • The signal is processed by the process: the process itself can register the address of the signal processor in the system. When the signal is sent, the signal is processed by the registered processor.
  • Default processing by the kernel: the signal is processed by the default processor of the kernel to execute the default action of the signal. For example, the default action for a process to receive a SIGFPE (floating point exception) is to generate a core and exit. In most cases, the signal is generated by thekernelhandle.

4. Message queue

Message queue is the link table of messages, which is stored in the kernel. A message queue is identified by an identifier (queue ID).

characteristic
  • Message queue can realize the random query of messages. Messages do not have to be read in the order of first in first out, but also can be read according to the type of messages. It has more advantages than FIFO.
  • Message queue overcomes the shortcomings of signal carrying less information, pipeline carrying only unformatted byte stream and limited buffer size.

5. Shared memory

Operating system -- deep understanding of process and thread

characteristic

  • Read and write the same memory directly
  • Address mapping, no need to copy
  • Synchronization mechanism is needed to ensure the security of shared memory

6. Semaphore

Semaphore is a counter used to access shared data by multiple processes. The purpose of semaphore is to synchronize between processes. It’s the p V primitive operation (producer consumer!) mentioned before by the operating system

The differences between semaphores and mutexes are as follows

(1) First, mutex is used for thread mutex, semaphore is used for thread synchronization. This is the fundamental difference between mutex and semaphore, that is, the difference between mutex and synchronization.

Mutual exclusion:It means that only one visitor is allowed to access a resource at the same time, which is unique and exclusive. However, mutex can’t limit the access order of visitors to resources, that is, access is out of order.

Synchronization:On the basis of mutual exclusion (in most cases), visitors can access resources in a certain order through other mechanisms. Can be achieved through semaphore!

(2) The mutex value can only be 0 / 1, and the semaphore value can be a non negative integer.

A mutex can only be used for the mutex access of one resource, it can’t realize the mutex problem of multiple resources. Semaphore can realize multithread mutual exclusion and synchronization of multiple similar resources. When the semaphore is a single valued semaphore, mutual exclusive access to a resource can also be completed.

7. Socket

4. Communication mode between threads

  1. Mutex (mutex): using mutex mechanism, only threads with mutex have access to public resources. Because there is only one mutex object, it can ensure that public resources will not be accessed by multiple threads at the same time. For example, the synchronized keyword and all kinds of locks in Java are all such mechanisms.
  2. Semaphores: it allows multiple threads to access multiple resources of the same kind at the same time, but it needs to control the maximum number of threads to access this resource at the same time
  3. Event: wait / notify: it can keep multi thread synchronization through notification operation, and it can also easily realize multi thread priority comparison operation

5. The underlying implementation of the process

6. The underlying implementation of thread

**Mainstream operating systems provide thread implementation * *, pay attention to this sentence, who implements thread? It’s an operating system. Actually, the big brother who implements threads is an operating system running in kernel mode.

6.1 there are three main ways to implement threads in the operating system

  • User level thread (non mainstream)
  • Kernel level threads (mainstream)
  • User level thread + kernel level thread, mixed implementation (non mainstream)

6.2Kernel level thread

Kernel thread is directly supported by the kernel of the operating system. This kind of thread is switched by the kernel. The kernel schedules the thread by manipulating the scheduler, and is responsible for mapping the task of the thread to each processor. Each kernel thread can be regarded as a part of the kernel.

Generally, user processes do not use kernel threads directly, but use an advanced interface of kernel threads light weight process (LWP)Lightweight process is what we usually call thread. Since each lightweight process is supported by a kernel thread, only when kernel thread is supported can there be lightweight process. This 1:1 relationship between lightweight process and kernel thread is called one-to-one thread model, as shown in the figure below.

Operating system -- deep understanding of process and thread

Because of the kernel thread support, each lightweight process becomes an independent scheduling unit. Even if a lightweight process is blocked in the system call, it will not affect the whole process to continue to work. However, the lightweight process has its limitations, mainly in the following two points

  • The creation and destruction of threads all need system call, but the cost of system call is relatively high, and it needs to switch back and forth between user state and kernel state.
  • Each lightweight process needs to be supported by a kernel thread, so the lightweight process consumes a certain amount of kernel resources (such as the stack space of kernel thread), so the number of lightweight threads supported by a system is limited.

6.2 user level threads

The implementation of user level thread is to put the whole thread implementation part in the user space. The kernel knows nothing about the thread. What the kernel sees is a single thread process.

For the underlying implementation of threads, few operating systems now use the pure user level thread, which is the thread model.

Note: for an operating system that implements user level threads, the basic unit of CPU scheduling looks like a process (because in the view of the kernel, these processes are single threaded, so scheduling a single thread is like scheduling a process).

Operating system -- deep understanding of process and thread

The advantage of using user thread is that it doesn’t need kernel support, and the disadvantage is that it doesn’t have kernel supportAll thread operations need to be handled by the user program. Thread creation, switching and scheduling are all issues to be considered. Therefore, the programs implemented by user threads are more complex. Except for the multithreaded programs in the operating system that does not support multithreading and a few programs with special requirements, there are fewer and fewer programs using user threads.

6.3A comparison of the two

As for the comparison of user level thread and kernel level thread, I think that we can start fromScheduling, overhead, performanceFrom these three perspectives.

  • dispatch: for user level threads, the operating system kernel is imperceptible, and the scheduling needs to be implemented by the developers themselves. On the contrary, the kernel level threads can be a shopkeeper, leaving the scheduling to the operating system kernel.
  • expenses: when we introduced the advantages of user level threads, we also mentioned that the cost of creating threads in user space is much smaller than that in kernel space.
  • performance: user level thread switching occurs in user space, which is at least one order of magnitude faster than sinking into the kernel. There is no need to sink into the kernel, no need to switch context, and no need to refresh the memory cache, which makes thread scheduling very fast.

In the early operating system, threads were not supported, which were implemented by user threads. Now threads are supported. Most of them use lightweight processes to map kernel threads to realize multithreading technology, including common windows and Linux one-to-one thread models.

7. Context switching between processes / threads

Recommended reading:Context switch

8. Scheduling algorithm of operating system process

  • First come first service (FCFS) scheduling algorithm: select a process that first enters the queue from the ready queue to allocate resources for it, and make it execute immediately until it finishes or is blocked due to an event, and then reschedule it.
  • Short job first (SJF) scheduling algorithm: select a process with the shortest estimated running time from the ready queue and allocate resources to it, so that it can execute immediately and execute until it is completed or an event occurs and it is blocked to give up the CPU occupation, and then reschedule.
  • Time slice rotation scheduling algorithmTime slice round robin scheduling is the oldest, simplest, fairest and most widely used algorithm, also known as RR (round robin) scheduling. Each process is assigned a period of time, called its time slice, that is, the time that the process is allowed to run.
  • Multi level feedback queue scheduling algorithmThere are some limitations in the algorithms of process scheduling. asThe short process first scheduling algorithm only takes care of the short process and ignores the long process. Multi level feedback queue scheduling algorithm can not only make high priority jobs get response, but also make short jobs (processes) complete quickly. So it’s the currentIt is generally accepted as a better process scheduling algorithmUNIX operating system adopts this scheduling algorithm.
  • Priority scheduling: assign priority to each process, execute the process with the highest priority first, and so on. Processes with the same priority are executed in FCFS mode. Priority can be determined based on memory requirements, time requirements, or any other resource requirements.

9. Comparison of coroutine and thread

Xiecheng is a kind ofUser mode lightweight thread, also known as “micro thread”, the scheduling of coroutines is completely controlled by the user. Here is a comparison between the two

  1. The execution efficiency of coprocessor is very high. Because subroutine switching is not thread switching, but controlled by the program itself, there is no thread switching overhead
  2. The coroutine does not need multithreading lock mechanism. In the coroutine, we only need to judge the state to control the shared resources without locking. Thread and process are synchronous, coroutine is asynchronous!

This work adoptsCC agreementReprint must indicate the author and the link of this article