The reason of low CPU utilization and high load, take a look at this article!

Time:2019-11-17

Reason summary

The reason is summarized as follows: there are too many processes waiting for disk I / O to complete, resulting in too long process queue, but there are few processes running by CPU, which shows that the load is too large and CPU utilization is low.

The following is the specific principle analysis:
Before analyzing why the load is high, this paper introduces the concepts of load, multitask operating system, process scheduling and so on.

What is load

What is load: load is the statistics of the sum of the number of processes that the CPU is processing and waiting for the CPU to process in a period of time, that is, the statistics of the length of the queue that the CPU uses. The smaller the number, the better (it is abnormal if it exceeds the CPU core * 0.7)

The load is divided into two parts: CPU load and IO load

For example, suppose there is a program for large-scale scientific calculation. Although the program will not input and output from disk frequently, it will take quite a long time to complete the processing. Because the program is mainly used for calculation, logical judgment and other processing, so the processing speed of the program mainly depends on the CPU computing speed. The program with such CPU load is called “calculation intensive program”.

There is also a kind of program, which mainly searches for arbitrary files from a large amount of data stored on the disk. The processing speed of this search program does not depend on the CPU, but on the read speed of the disk, that is, input / output (I / O). The faster the disk, the shorter the retrieval time. This kind of I / o-loaded program is called “I / O-Intensive program”.

What is multitasking operating system

The Linux operating system can handle several tasks with different names at the same time. However, in the process of running multiple tasks at the same time, CPU and disk, which are limited hardware resources, need to be shared by these task programs. Even in a very short time interval, you need to switch between these tasks at the same time for processing, which is multitasking.

When there are fewer tasks in operation, the system does not wait for such switching action. But when the task is added, for example, task a is performing calculation on the CPU. Next, if Task B and C also want to perform calculation, they need to wait for the CPU to be idle. That is to say, even if a task is to be run and processed, it can only be run when it’s his turn. Such waiting state is shown as program running delay.

The number of "load average" in the uptime output
[[email protected] ~]# uptime
 11:16:38 up  2:06, 4 users, load average: 0.00, 0.02, 0.05

From the left, load average shows the number of waiting tasks per unit time in the past 1 minute, 5 minutes and 15 minutes, that is, how many tasks are waiting on average. When the load average is high, it means that there are many tasks waiting to run, so there will be a large delay in the waiting time for the task to run, which reflects the high load at this time.

Process scheduling

What is process scheduling

Process scheduling is also called CPU context switching by some people, which means that CPU switching to another process needs to save the state of the current process and restore the state of another process: the current running task turns to the ready (or suspended, interrupted) state, and another selected ready task becomes the current task. Process scheduling includes saving the running environment of the current task and recovering the running environment of the task to be run.
In the Linux kernel, there is a management table named “process descriptor” for each process. The process descriptor is adjusted to sort in descending priority order, and the processes (tasks) have been run in a reasonable order. This adjustment is the work of the process scheduler.
The scheduler divides and manages the state of the process, such as:

  • The state of waiting for CPU resources to be allocated.
  • The state of waiting for the completion of disk input and output.

Let’s talk about the process state difference:

The reason of low CPU utilization and high load, take a look at this article!

The following is an example of a process state transition:

There are three processes a, B and C running at the same time. First of all, each process is in the runnable state after it is generated, that is, the beginning of the running state, rather than the current running state. Because there is no difference between the running state and the running waiting state in the Linux kernel, the running state and the running state are called the running state.

  • Process a: running
  • Process B: running
  • Process C: running

The three running processes immediately become the scheduling objects. At this point, it is assumed that the scheduler assigns the CPU operation permission to process a.

  • Process a: running
  • Process B: running
  • Process C: running

Process a allocates CPU, so process a starts processing. Process B and C wait here for process a to move out of the CPU. Suppose that process a needs to read data from disk after several calculations. After a sends the request to read the disk data, no work will be done until the requested data arrives. This state is known as “blocked waiting for I / O operations to end.”. Before I / O completes processing, process a is always waiting, and it will turn to uninterruptible sleep state without using CPU. Then the scheduler looks at the priority calculation results of process B and process C, and gives the CPU operation authority to the higher priority party. It is assumed that process B takes precedence over process c.

  • Process a: uninterruptible
  • Process B: running
  • Process C: running

As soon as process B starts running, it needs to wait for the user’s keyboard input. Then B enters the waiting state for the user’s keyboard input, which is also blocked. As a result, process a and process B are waiting for output and running process c. At this time, both process a and process B are in wait state, but wait for disk input and output and wait for keyboard input are in different states. Waiting for keyboard input is an infinite event waiting, while reading disk is an event waiting that must be completed in a short time. These are two different waiting states. The status of each process is as follows:
Process a: uninterruptible
Process B: interruptible (waiting for keyboard input / output / interruptible state)
Process C: running

This time, it is assumed that during the running process of process C, the data requested by process a reaches the buffer device from the disk. Next, the hard disk sends an interrupt signal to the kernel. The kernel knows that the disk reading is completed, and restores process a to the runnable state.

  • Process a: running
  • Process B: interruptible (waiting for keyboard input / output / interruptible state)
  • Process C: running

After that, process C will change to a waiting state. For example, CPU occupation time exceeds the upper limit, task ends, I / O waiting. Once these conditions are met, the scheduler can complete the process state transition from process C to process a.

Meaning of load

The load represents “the average number of waiting processes.”. In the above process state transformation process, except the running state, all other states are waiting states. Will all other states be added to the load waiting process?

It has been proved that only processes in running state and non interruptible state can be added to the load waiting process, that is to say, processes in the following two cases will behave as load values.

  • Even if you need to use the CPU immediately, you need to wait for other processes to run out of CPU
  • Even if you need to continue processing, you must wait for the disk I / O to finish

The following describes an intuitive scenario that explains why only running and interruptible states are added to the load.

For example, in the process of CPU intensive processing, for example, in the process of animation coding, although other processing of the same type is desired, the system response becomes very slow. In addition, when a large amount of data is read from the disk, the system response also becomes very slow. However, on the other hand, no matter how many processes are waiting for keyboard input and output operations, the system response will not slow down.

What scenario would cause low CPU and high load?

Through the above specific analysis of the meaning of the load is obvious. The load is summed up as a sentence: the number of processes that need to run processing but must wait for the process processing before the queue to complete. Specifically, there are two situations as follows:

  • Waiting for the process authorized to run by CPU
  • Process waiting for disk I / O to complete

Low CPU and high load means that there are too many processes waiting for disk I / O to complete, which will lead to too long queue. This shows that the load is too large, but in fact, the CPU is assigned to perform other tasks or idle at this time. Specific scenarios are as follows.

Scenario 1: too many disk read and write requests will lead to a large number of I / O waiting

As mentioned above, the working efficiency of CPU is higher than that of disk, and the process running on the CPU needs to access the disk file. At this time, CPU will send a request to the kernel to call the file, and let the kernel go to disk to get the file. At this time, it will switch to other processes or idle, and the task will be converted to an uninterrupted sleep state. When there are too many read and write requests, there will be too many processes in non interruptible sleep state, resulting in high load and low CPU.

Scenario 2: there are statements without index or deadlock in MySQL

We all know that MySQL data is stored in the hard disk. If you need to query SQL, you need to load the data from the disk into the memory first. When the data is very large, if the executed SQL statement has no index, it will cause too many rows in the scan table to cause I / O blocking, or there is deadlock in the statement, which will also cause I / O blocking, resulting in too many non interruptible sleep processes and too much load.
The specific solution is to run the show full processlist command in MySQL to check the thread waiting status and take out the statements for optimization.

Scenario 3: external hard disk fails. It is common that NFS is hung, but NFS server fails

For example, when our system mounts an external hard disk such as NFS shared storage, there are often a large number of read and write requests to access the files stored in NFS. If NFS server fails at this time, the process read and write requests will not get resources all the time, so the process is always in an uninterrupted state, resulting in high load.

Conclusion: This is the general content. If a friend meets other scenes, please leave a message to add.

Author: Ximen feibing, a post-90s it man, has been working in Beijing. He loves sports, adventure and travel.

Pay attention to the technical road of migrant workers, WeChat public number dialog box reply key words: 1024 can get a newly collated technical dry goods.

The reason of low CPU utilization and high load, take a look at this article!