How does Linux view system load


The load state of the operating system reflects the resource utilization of the application, from which the bottleneck of application optimization can be found.

The average system load refers to the average number of processes in running or undisturbed state\
It is in the running state, indicating the running state, occupying CPU, or ready state, waiting for CPU scheduling\
Non interruptible indicates blocking and waiting for I / O

In Linux system, the uptime command is usually used to check the load condition (W command and top command are also available)*

IUptime command

$ uptime\
16:33:56 up 69 days,  5:10,  1 user,  load average: 0.14, 0.24, 0.29

The above information is analyzed as follows:

  • 16: 33:56: current time
  • Up 69 days, 5:10: the system has been running for 69 days, 5 hours and 10 minutes
  • 1 user: one user is currently logged into the system. Load average: 0.14, 0.24, 0.29: the average load of the system in the past 1 minute, 5 minutes and 15 minutes
  • Load average: 0.14, 0.24, 0.29: the average load of the system in the past 1 minute, 5 minutes and 15 minutes

Average load analysis

View the number of logical CPU cores:

$ grep 'model name' /proc/cpuinfo | wc -l\

The running results show that there is a logical CPU core. Taking one CPU core as an example, assume that the CPU can process up to 100 processes per minute  

  • Load = 0, no process needs CPU
  • Load = 0.5, the CPU processes 50 processes
  • Load = 1, the CPU processes 100 processes. At this time, the CPU is full, but the system can operate smoothly
  • Load = 1.5, the CPU has processed 100 processes, and 50 processes are being excluded waiting for CPU processing. At this time, the CPU has been overloaded

For the smooth operation of the system, the load value should not exceed 1.0, so that no processes need to wait, and all processes can be processed at the first time\
Obviously, 1.0 is a key value. Beyond this value, the system will not be in the best state. Generally, 0.7 is an ideal value\
In addition, the health status of the load value is also related to the number of CPU cores in the system. If the number of CPU cores is 2, the health value of the load value should be 2, and so on.  \
The average load value within 15 minutes is generally used to evaluate the load of the system.

2、 W command

$ w\
 17:47:40 up 69 days,  6:24,  1 user,  load average: 0.46, 0.26, 0.25\
USER     TTY      FROM              [email protected]   IDLE   JCPU   PCPU WHAT\
lvinkim  pts/0      15:55    0.00s  0.02s  0.00s w

Line 1: same as uptime.  \
Below line 2, a list of currently logged in users.

3、 Top command

$ top\
top - 17:51:23 up 69 days,  6:28,  1 user,  load average: 0.31, 0.30, 0.26\
Tasks:  99 total,   1 running,  98 sleeping,   0 stopped,   0 zombie\
Cpu(s):  2.3%us,  0.2%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st\
Mem:   1922244k total,  1737480k used,   184764k free,   208576k buffers\
Swap:        0k total,        0k used,        0k free,   466732k cached\
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                \
    1 root      20   0 19232 1004  708 S  0.0  0.1   0:01.17 init                                                                    \
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd                                                                \

Line 1: same as uptime.

Line 2: process count information.

  • Tasks: 99 total: there are 99 processes in total
  • 1 running: one process is occupying CPU
  • 98 sleeping: 98 sleep processes
  • 0 stopped: 0 stopped processes
  • 0 Zombie: 0 zombie processes

Line 3: CPU usage

  • Us (user): the ratio of CPU occupied by non nice user processes
  • Sy (system): ratio of CPU occupied by kernel and kernel process
  • Ni (NICE): CPU utilization ratio of processes with changed priority in user process space
  • ID (idle): CPU idle ratio. If the system is slow and this value is very high, it indicates that the reason for the slow system is not the high CPU load
  • Wa (iowait): the time ratio of the CPU waiting for I / O operations. This indicator can be used to troubleshoot disk I / O problems. It is usually judged in combination with WA and ID
  • Hi (hardware IRQ): the ratio of time taken by the CPU to process hardware interrupts
  • Si (software interrupts): the ratio of CPU processing software interrupts
  • St (steal): elapsed time, the ratio of CPU time occupied by other tasks in the virtual machine

Some situations needing attention:

  • High user process US ratio and low I / O operation Wa: it indicates that the system is slow because the process occupies a lot of CPU, which is usually accompanied by a low idle ratio ID, indicating that the CPU idle time is very little.
  • Low I / O operation Wa and high idle ratio ID: it can eliminate the possibility of CPU resource bottleneck.
  • I / O operation wa high: it means that I / O takes up a lot of CPU time, so it is necessary to check the use of switching space. The switching space is located on the disk, and its performance is far lower than that of memory. When the memory runs out and starts to use the switching space, it will have a serious impact on the performance. Therefore, it is generally recommended to close the switching space for servers with high performance requirements. On the other hand, if the memory is sufficient but the Wa is high, it indicates that it is necessary to check which process is consuming a lot of I / O resources.

More load cases can be judged flexibly in practice.

4、 Iostat command

The iostat command can view the IO usage of the system partition

$ iostat \
Linux 2.6.32-573.22.1.el6.x86_64 (sgs02)   01/20/2017     _x86_64_   (1 CPU)\
avg-cpu:  %user   %nice %system %iowait  %steal   %idle\
           2.29    0.00    0.25    0.04    0.00   97.41\
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn\
vda               1.15         3.48        21.88   21016084  131997520

Some noteworthy IO indicators:

  • Device: disk name
  • TPS: number of I / O transfer requests per second
  • Blk_ Read / s: how many blocks are read per second? To view the block size, refer to the command tune2fs
  • Blk_ Wrtn / s: how many blocks are written and fetched per second
  • Blk_ Read: how many pieces have you read
  • Blk_ Wrtn: how many pieces did you write

5、 Iotop command

The iotop command is similar to the top command, but it shows the I / O status of each process. It plays a great role in locating processes with heavy I / O operations\

# iotop\
Total DISK READ: 0.00 B/s | Total DISK WRITE: 774.52 K/s\
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                \
  272 be/3 root        0.00 B/s    0.00 B/s  0.00 %  4.86 % [jbd2/vda1-8]\
 9072 be/4 mysql       0.00 B/s  268.71 K/s  0.00 %  0.00 % mysqld\
 5058 be/4 lvinkim     0.00 B/s    3.95 K/s  0.00 %  0.00 % php-fpm: pool www\
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init

You can see the reading and writing intensity of different tasks.

6、 Sysstat tool

Many times, when the historical high load condition is detected or known, it may be necessary to play back the historical monitoring data. At this time, the SAR command is used. The SAR command is also from the sysstat toolkit, which can record the CPU load, I / O status and memory usage of the system, so as to facilitate the playback of historical data.

The configuration file of sysstat is in the / etc / sysconfig / sysstat file, and the storage location of historical logs is / var / log / SA\
Statistics are recorded every 10 minutes. The statistics file is split at 23:59 every day. The frequency of these operations is configured in the / etc / cron.d/sysstat file\

7、 SAR command

Use the SAR command to view the CPU usage of the day:

$ sar\
Linux 2.6.32-431.23.3.el6.x86_64 (szs01)   01/20/2017     _x86_64_   (1 CPU)\
10:50:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle\
11:00:01 AM     all      0.45      0.00      0.22      0.40      0.00     98.93\
Average:        all      0.45      0.00      0.22      0.40      0.00     98.93

Use the SAR command to view the memory usage of the day:

$ sar -r\
Linux 2.6.32-431.23.3.el6.x86_64 (szs01)   01/20/2017     _x86_64_   (1 CPU)\
10:50:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit\
11:00:01 AM     41292    459180     91.75     44072    164620    822392    164.32\
Average:        41292    459180     91.75     44072    164620    822392    164.32

Use the SAR command to view the current day IO statistics:

$ sar -b\
Linux 2.6.32-431.23.3.el6.x86_64 (szs01)   01/20/2017     _x86_64_   (1 CPU)\
10:50:01 AM       tps      rtps      wtps   bread/s   bwrtn/s\
11:00:01 AM      3.31      2.14      1.17     37.18     16.84\
Average:         3.31      2.14      1.17     37.18     16.84

For more SAR usage, see man SAR.

This work adoptsCC agreement, reprint must indicate the author and the link to this article

See if you are a reliable programmer. Let’s try

Recommended Today

Hive built-in function summary

1. Related help operation functions View built-in functions: Show functions; Display function details: desc function ABS; Display function extension information: desc function extended concat; 2. Learn the ultimate mental method of built-in function Step 1: carefully read all the functions of the show functions command to establish an overall understanding and impression Step 2: use […]