Using performance monitor to monitor common indicators of sqlserver under Windows

Time:2021-5-6

As mentioned in the above article, win’s performance monitor is a necessary tool for monitoring database performance. Next, I will introduce some common monitoring indicators to you. In fact, they are nothing more than the operation indicators of disk, CPU, memory and other hardware, as well as the database’s own locks, user connections, and other parameters that need attention according to its own business decisions.

1.SQL Server Buffer: Buffer Cache Hit Ratio

This is an important parameter to check if there is not enough memory. The counter buffer cache hit ratio in SQL Server buffer is used to indicate how often SQL server obtains data from the cache rather than from the disk. SQL server will cache some queried data in memory for further query. When a query a comes in, the database will compile the SQL to see what data is needed, and then execute the plan. First, go to the memory to see if there is any data needed for the query. If the same SQL has been executed just now or the data of the table has been cached in memory, but no data is found in memory, That may be due to the memory squeeze caused by insufficient memory, write the cache data back to the hard disk or release it to provide other requests to the database for use. Generally speaking, for OLTP systems, this value should be at least 90%, and the ideal value is 99%. If the value is less than 90%, it is suggested that you should add more memory.

2.Memory: Pages/sec

This is also an important parameter to monitor whether the memory is insufficient. This counter records the number of pages exchanged between memory and disk per second. Frequent exchange of pages will consume more IO, which will affect the performance of the server. For example, a supermarket has a shelf full of new products a, B and C. when you go to the supermarket and want to buy a, you can get a directly from the shelf. It’s very convenient. When customers walk around the supermarket, they tell you how I didn’t find the old product D. I just want to buy this d, Then the staff will go to the warehouse to take out commodity D and put it on the shelf for customers to buy next time. But what can we do when the shelves are full? We can only take a which has not been asked for for a long time and put it in the warehouse, and then put D in the spare place. But next time another customer comes and wants to buy a, the staff will have to take a out again and replace D on the shelves. In fact, the memory is the shelf, and the hard disk is the warehouse. Because the shelves are too small, the goods on the shelves can only be changed frequently to provide normal operation. In order to reduce the cost of repeated back and forth transportation, we can only change a larger shelf to meet the demand.

If you only run SQL Server on the server, the ideal range of this indicator should be 0-20. Occasionally, if it exceeds 20, it has little effect. If it frequently exceeds 20, it means that your server may need to add memory.

Of course, this indicator should be combined with the buffer cache hit ratio of the previous indicator. If the buffer hit of the previous indicator is always 99% or higher, and your page swapping is always more than 20 during this period, it means that not only is the memory insufficient, but other programs are occupying the system memory.

3.Memory: Available Bytes

Another counter that monitors memory is this one. This value should be greater than 5m at least, because SQL server always needs to maintain 5-10m free memory for allocation. When this value is less than 5m, SQL server may have performance bottlenecks due to lack of memory.

4.Physical Disk: % Disk Time

This counter records the busy degree of the disk (the busy degree of the whole disk array or physical disk). Theoretically, this value should be lower than 55%. If it continues to be higher than 55%, there may be IO bottlenecks on this server.

If it happens only occasionally, don’t worry, but you can find out what the database is doing, what statements are executed, and how to optimize it.

5.Physical Disk: Avg. Disk Queue Length

This is an important indicator to check the disk IO. Theoretically, the value of each physical disk should not exceed 2. Of course, this value needs to be calculated. For example, four physical disks are used to make RAID10. At this time, the average value of disk queue in a monitoring cycle is 10, and the queue value of each disk is 10 / 4 = 2.5. Then it can be said that there is an I / o bottleneck in this disk array. This is the same as the previous disktime index. If it appears occasionally, don’t worry. If it appears for a long time, you have to consider solving the problem of disk IO performance.

6.Processor: % Processor Time

This is an indicator to monitor the CPU (similar to disk time). This is a key parameter to observe CPU utilization. If the value of processor time counter continues to exceed 80%, it indicates that the CPU has a bottleneck problem. If it only appears occasionally, it means that there may be a CPU consuming query at this time point. You can try to catch SQL and optimize it the next time this time point comes. If the CPU keeps soaring after a certain point in time, the common situations are: 1. Sudden high concurrency 2. Index reorganization 3. Suddenly a frequently used index with a large amount of data fails 4. Deadlock 5. Many others. Find the problem first, and then deal with it.

7.System: Processor Queue Length

This indicator is similar to disk queue length, which is also calculated as a single CPU. A single CPU cannot exceed 2. For example, if you are a 2U machine, this value should not exceed 4. If you continuously exceed 4 in a monitoring cycle, there may be a CPU bottleneck.

Basically, there are so many commonly used counters that can cooperate with you to detect the performance of SQL server. If you are interested, you can download them by yourself.