Why are they so fast: from multiprocessing to multithreading to I / O reuse


All along, the computer embodies its instrumental value: improving people’s efficiency. Therefore, whether from the bottom hardware, middle operating system layer or upper application software, fast speed (including computing speed and response speed) is always the unremitting pursuit of computers.

Fast is similar, unhappy for a variety of reasons.

In this article, let’s talk about the similar “fast” of computer, and combine itNginxAndRedisTo illustrate these common “routines” of “fast” in detail.

Time sharing operating system

In the early days, computer resources were extremely limited (now and forever), but the achievements of science and technology should not be enjoyed exclusively and should serve the public. Therefore, the concept of time-sharing was introduced in the 1970s, so that the computer can rotate its resources according to time slices(share)The method can be used by multiple users, which greatly reduces the computer cost(Economy), enabling individuals and organizations to use computers without actually owning them. The computer system that uses time-sharing scheme to serve users is time-sharing operating system.

notes: the introduction of time-sharing model is a major technological innovation in computer history. UNIX and UNIX like operating systems belong to time-sharing operating systems.

Multiprocess and multithreading

Facing the requests of multiple user tasks, the operating system naturally needs to find an excellent model to schedule them efficiently. The user’s task mapped to the computer is the program, and the execution instance of the program is the process, so this excellent model isProcess model, process is the basic unit of resource allocation and scheduling in time-sharing operating system. Everything happens on this basis.

Why are they so fast: from multiprocessing to multithreading to I / O reuse

Process diagram from user task to operating system

The process model solves the problem of multi task request, and the IPC (interprocess communication) model solves the problem of multi task cooperation, but the computerResources are still limited(always keep this in mind). A process has its own independent virtual address space, file descriptor and signal processing. A single process still consumes a lot of memory and processor resources when running, and the cost of multi process switching is also great, so it needs to be replacedShare / reuseSomething, soThread modelIt came into being.

Thread is to further optimize the scheduling mode of the operating system based on the process and increase the granularity that can be shared. Threads in a process can not only have independent call stacks, registers and local storage, but also share all resources of the current process. Multithreading model can make greater use of CPU resources. When the current thread is blocked, other threads can obtain CPU execution rights, so as to improve the response speed of the system. If you addProcessor Affinity(processor affinity / Association) feature: assign tasks to the specified CPU core, which can also save the overhead of thread switching(A core allocates a thread)。

Why are they so fast: from multiprocessing to multithreading to I / O reuse

Process thread model diagram

If a task corresponds to a process, a part (subtask) in a task corresponds to a thread. ifMulti process resolutionYesComputational problems(based on multi-core CPU) to improve the computing performance, then in a processMultithreadingbesolveYesBlocking problem, improved response speed.

IO multiplexing

In the real scene, multi process / thread can improve the execution efficiency of the computer to a certain extent, but the number of concurrent processes / threads on the limited cores will reach a certain bottleneck, that is, it can not grow on a large scale: if the number of processes / threads is too small, the concurrency performance is not high; If there are too many processes / threads, frequent context switching will bring huge time overhead.

For example, design a network application and assign a process / thread to each connection. This architecture is simple and easy to implement, but when dealing with thousands of connections at the same time, the server program is very complexUnable to scale growth(system resources will gradually run out as the number of connections increases, which is the problem with Apache server programs). At the same time, there is a huge asymmetry in resource utilization: quite lightweight connections (represented by file descriptors and a small amount of memory) are mapped to individual threads or processes (this is a very heavyweight operating system object). Therefore, the pattern of assigning a process / thread to a request is easy to implement, but it isA great waste of computer resources。 This is the famous C10K problem (that is, the single machine supports 10000 concurrent connections).

computerResources are still limited(always remember this), how to make the computer execute faster on limited computer resources? Good programmers ask the computer once a day, “can it be faster?”.

Therefore, good programmers will soon think about whether one process / thread can handle multiple connections. The good medicine to solve the C10K problem is I / O multiplexing (from select to poll), which makes asynchronous calls to I / O without blocking. For example, through the select system call, the polling operation originally carried out by the request thread is now carried out by the kernel. On the surface, there is an additional layer of system call time, but it improves the efficiency because it supports multi-channel I / O. This ishighThe real key to concurrency。 If the multi-channel polling operation of the kernel is changed to the event notification based mode (epoll), epoll will notify us of which descriptor and what I / O events have occurred, eliminating the polling operation, so as to further improve the concurrency performance.

The above briefly discusses the historical evolution and basic principles of the pursuit of faster execution of computer applications. Next, let’s take a look at the best practices based on this in the industry.

Why is nginx so fast

Nginx is a recognized high-performance web and reverse proxy server program in the industry. It is famous for its high performance and high concurrency. In the official test results, it can support 50000 concurrent connections (in the actual scenario, it supports 20000-40000 concurrent connections).

The factors of high concurrency of nginx can be summarized as follows:

  1. I / O multiplexing: from select to poll

  2. Event notification (asynchronous): from poll to epoll

  3. Memory mapped file: from read file to memory mapped file

  4. Processor Affinity:A core specifies the number of processes allocated

  5. Process single thread model: avoids thread switching overhead

  6. The thread pool function (1.7.1 +) throws I / O modules that may be blocked into the thread pool

Why is redis so fast

Redis is a widely used high concurrency memory kV database. Its high concurrency is mainly guaranteed by the following points:

  1. I / O multiplexing: from select to poll

  2. Event notification (asynchronous): from poll to epoll

  3. Memory based operation: read and write speed is very fast

  4. Single thread mode: avoids thread switching overhead

  5. Multithreading enabling: coincidentally, it is similar to the thread pool function of nginx, redis4 0 introduces multithreading, which is specially used to deal with some scenes of large key value pairs that are easy to block.


Through the cases of nginx and redis, we can see that the “routine” of designing such applications to make them fast is similar:

  • Step 1: useProcessor AffinityFeature: confirm the number of concurrent processes according to the number of CPU cores, and then bind a process to a CPU core.

  • Step 2: using the process single thread mode, each process thread is bound to a CPU core to avoid the overhead caused by thread switching. And processes on multiple CPU cores execute in parallel.

  • Step 3: using I / O multiplexing, one thread processes n connections, which is the most critical step in high concurrency.

  • Step 4: use event notification to change from blocking to non blocking event driven, avoiding the polling operation of file descriptor.

  • Step 5: start the thread pool function for some scenarios where the execution time is too long and it is easy to block, so as to further improve the response speed

  • Step 6: tuning for some specific scenarios, such as the memory mapping file of nginx

So, in addition to these, can it be faster? In the near future, the server will handle millions of concurrent connections (c10m problem). As CMOS technology approaches the physical limit, Moore’s law is coming to an end, and CPU is developing towards multi-core.MulticoreThe responsibility for identifying parallelism and deciding how to use parallelism is transferred to programmers and language systems. Therefore, there are programming languages specially used for concurrency scenarios, such as Erlang and golang, which re-examine the problem from a new model perspective, solve the problem and make the computer faster.

The official account of WeChat is yablog.

Welcome to scan code and pay attention

Why are they so fast: from multiprocessing to multithreading to I / O reuse