In this series, we will discuss Linux performance measurement and how to measure it correctly. Linux performance is a very broad topic, so we will focus on the four main resources that usually improve system performance — CPU, memory, disk storage and network.
Now, when we talk about component related performance, many scenarios just need you to increase the CPU quota to solve the problem. But this may not be what you want to do, and you want to understand the real causes and solutions.
At the very least, you need to know where the resource utilization for a particular resource comes from, because that’s how you can change the application workload.
Queuing and concurrency
Now, I think another important thing to understand about resources is to understand that when you have CPU resources, they have some kind of natural concurrency that we can use to execute the workload.
So, for example, if you look at your CPU, it might have multiple frequency CPU cores that can execute more or less in parallel. However, if you have more tasks in the CPU, you can queue them. If you have a spinning disk, it can only perform one I / O request at a time.
Your raid volume or SSD can do more at once. But all of them have some kind of concurrency, and they can perform well, followed by queuing.
What’s wrong with queuing? Queuing increases execution time (latency), which is how your end user experiences performance.
If you want to learn more about this method, it works not only on Linux, but also on other systems, and I suggest you check this outUse method-Utilization, saturation and errors.
Let’s start with some typical mistakes that are common in Linux. I think one of our most common misconceptions is load sharing. In many cases, I see people reading the average load, maybe monitoring it a little bit, and then saying, “Oh, this is a magic number – if the average load is greater than 5, it’s not good.”.
I think it’s a problem. For example, the average load of the server is quite high, ranging from 40 to 60, but I can conclude that the server has 80 CPU cores, and it’s not much.
What can you tell me about the load of this server?
Although you can use load sharing to gain insight into your system, many people don’t really understand it, but see it as a magic number.
What problems do we see in average load?
One is that it does mix I / O and CPU usage. You may need a lot of I / O work and a lot of CPU work. The average load of both of them may be very high, so you really don’t know what this is.
It’s not really standardized. As I mentioned, you can create a VM with only one CPU core, or our production system with 100 CPU cores, which is two orders of magnitude different.
Indeed, that means they can’t use their own “magic numbers,” which we could use when all servers had one, two, or four CPU cores a decade ago or earlier. In these cases, the average load of 20 is not normal for the service.
It’s not standardized and has a lot of computing artifacts, especially in Linux. If you are interested in these computational artifacts, pleaseBrandon Gregg’s blogView in. He did all the great work, and Linux load sharing is one of the things he wrote recently.
Now, the least I have to do is separate the I / O load from the CPU load. This is basically the same average load, but it is evenly distributed. In this case, I can see that if I standardize the CPU load, it is actually very low. I don’t wait for tasks, but I / O fluctuates constantly.
A better way to look at CPU or / and some other resource utilization is psi. It’s a new tool. It’s part of the new kernel, and it may not work for all Linux distributions, so I think it will be useful in a few years for practical purposes.
It enables us to really understand from the perspective of latency what is blocking a given task, when it is blocked due to idle CPU running queue, waiting for disk I / O or due to memory pressure.
It gives us good information about performance and its impact on end users, rather than focusing on the “magic numbers” that people usually want to do. For more information, read thisBlog。
What you can do now is use ebpf, which is a great thing for monitoring. You can use and use this information in a variety of ways. For example, there is a cloudflare exporter where you can provide this information to Prometheus for monitoring.
This is a command-line tool in the BCC tools collection, where you can actually see how much time it takes to run the queue delay or to schedule tasks on the CPU.
If the application is scheduled almost immediately, it means that there are available CPUs, and no matter what kind of CPU utilization graph you use, the waiting time is not long. If the waiting time is really long, it means that your program has not been scheduled for a long time.
In this case, the good thing is that they will deal with some complex situations, only CPU utilization will not be displayed. For example, Linux CPU scheduler is extremely complex, especially in NUMA environment.
Therefore, no matter what logic is executed, if it takes a long time to schedule, the user program will be affected. You can’t see it in the CPU utilization graph, but you’ll see it in the round queue delay message.
Now, that’s another mistake. This is an example of Prometheus, which obviously tells us the CPU utilization of different nodes.
Which of these corresponds to the CPU in use?
I’ve seen a lot of suggestions on charts on the Internet, and I might say, “Oh, let’s run its queries, all non idle states use CPU.”
This is not the case because of at least two important misunderstandings:
Iowait is idle
From the perspective of CPU, CPU is a kind of
iowaitState is idle.
Yes, for convenience, I decided to add another state to this state – the idle state when there is no disk I / O, and the idle state when there is disk I / O.
But from the CPU point of view, if your system displays 99 CPUs in I / O state, then this is not a CPU bottleneck, but an I / O bottleneck.
2. Steel is the CPU that virtual machine is not available
In VM and some cloud environments,
stealCorresponding to the CPU, the CPU is not really used by the application in anything you do.
It refers to the neighbor’s virtual machine. Because CPUs are shared in the virtual environment, another VM may be running a steal CPU cycle from you on the same CPU core.
More CPU fun
It’s more fun to learn about CPU performance. So, for example, modern CPUs change their frequency with load. Sometimes it can be as high as five times.
In addition, the peak core frequency that the system can provide depends on the load – especially the load you run on a core – and the number of cores used.
Not to mention that virtual kernels are not the same in terms of actual kernels.
From a utilization point of view, in this case, if your CPU utilization is close to saturation, you’ll see it everywhere.
But it’s important to understand capacity planning, because if you know that your application grows linearly with your workload, and you can see that CPU utilization is only 5%, you might guess that your application can grow at least 20 times.
This may not be the case for the complex mathematics of how to scale CPUs today.
PS: This article belongs to translation,original text