Linux operation and maintenance — 1. Disk related knowledge


Physical structure of disk

(1) Disk: the disk body of a hard disk consists of multiple disks stacked together.

When the hard disk is delivered from the factory, the hard disk manufacturer completes the low-level format (physical format), which is used to divide the blank platter into tracks with the same center and different radius, and divide the tracks into several sectors. Each sector can store 128 × 2 N-power (n = byte information, and the default size of each sector is 512 bytes. In general, users do not need to perform low-level formatting operations.


(2) Head: each disc has a head on both sides.


(3) Spindle: all discs are driven by spindle motor.


(4) Control IC board: complex! There are also ROM (internal software system), cache and so on.


How to complete a single IO operation on disk 2

(1) Seek the way

When the controller sends an IO operation command to the disk, the actuator arm of the disk drives the head to leave the landing zone (the area without data in the inner circle) and move it to the top of the track where the initial data block to be operated is located. This process is called seek, and the corresponding time consumed is called seek time;

(2) Rotation delay

When the corresponding track is found, the data can’t be read immediately. At this time, the head can’t read data until the disk platter rotates to the sector where the initial data block is located and falls directly below the read / write head. The time spent waiting for the disk to rotate to the operable sector is called rotational latency;

(3) Data transmission

Next, as the disk rotates, the head continuously reads / writes the corresponding data block until all the data needed for the IO is completed. This process is called data transfer, and the corresponding time is called transfer time. After these three steps, a single IO operation is completed.

According to the process of a single disk IO operation, you can find:

Single IO time = seek time + rotation delay + transfer time

Then the formula of IOPs (IO per second) is calculated as follows:

IOPs = 1000ms / single IO time

Three disk IOPs calculation

What is the seek time, rotation delay and data transfer time of different disks?

1. Seek time

Considering that the read-write data may be in any track of the disk, either in the inner circle of the disk (the shortest seek time) or in the outer circle of the disk (the longest seek time), we only consider the average seek time in the calculation.

When purchasing disks, this parameter indicates that the current SATA / SAS disks have different seek times according to different rotating speeds, but they are usually less than 10ms:


Average seek time







2. Rotation delay

As with seek, when the head is positioned on the track, it may be just above the sector to be read and written. In this case, it is possible to read and write data immediately without additional delay, but in the worst case, the head can read data only after the disk rotates for a full circle, so the average rotation delay is also considered here. For a 150000rpm disk, it is (60s / 15000) * (1 / 2)

3. Transmission time

(1) Disk transfer rate
There are two types of disk transfer rates: internal transfer rate and external transfer rate.

Internal transmission rate(internal transfer rate) refers to the data transfer rate between the head and the hard disk cache. In short, it is the speed at which the hard disk head reads the data from the disk and stores it in the cache.

There is no seek in the ideal internal transmission rate, and the rotation delay will always read data on the same track and transmit it to the cache. Obviously, this is impossible because the storage space of a single track is limited;

The actual internal transmission rate includes seek and rotation delay. At present, the stable internal transmission rate of domestic disk is generally between 30MB / s and 45Mb / S (the server disk should be higher).

External transmission rateExternal transfer rate refers to the data transfer rate between the hard disk cache and the system bus, that is, the rate at which the computer reads data from the cache to the corresponding hard disk controller through the hard disk interface.

In the hard disk parameters, the hard disk manufacturer usually gives a maximum transmission rate, for example, the current 6 Gbit / s of sata3.0, which is 6 * 1024 / 8768mb / s after conversion, usually refers to the maximum transmission rate of the hard disk interface. Of course, this value cannot be achieved in actual use.

The IOPs is calculated here, and the actual internal transmission rate is conservatively selected, taking 40m / s as an example.

(2) Size of a single IO operation
With the transfer rate, you need to know the IO chunk size of a single IO operation to calculate the transfer time of a single io. What is the size of a single disk IO? The answer is: not sure.

Operating system to improve

There is only one purpose to improve the efficiency of data reading and writing, whether it is the caching at the operating system level or the disk controller level. Therefore, the size of each individual IO operation is different, which mainly depends on the system’s judgment of data read-write efficiency. Take the data page size of SQL Server database as an example: 8K.

(3) Transmission time
Transfer time = IO chunk size / internal transfer rate = 8K / 40m / S = 0.2ms

It can be found that:
(3.1) if the IO chunk size is large, the transmission time will be longer, and the single IO time will be longer, resulting in smaller IOPs;
(3.2) the main read-write cost of mechanical disk is spent on addressing time, that is, seek time + rotation delay, that is, the swing of disk arm and rotation delay of disk.
(3.3) if IOPs is roughly calculated, transmission time can be ignored, 1000ms / (seek time + rotation delay).

4. IOPs calculation example
Take 15000rpm as an example:

(1) Single IO time
Single IO time = seek time + rotation delay + transmission time = 3MS + 2ms + 0.2ms = 5.2ms

(2) IOPS
IOPs = 1000ms / single IO time = 1000ms / 5.2ms = 192 (Times)
The random access IOPs of a single disk is calculated here.

In an extreme case, if all the disks are accessed in sequence, then it can be ignored: seek time + rotation delay time, the IOPs calculation formula becomes: IOPs = 1000ms / transfer time
IOPs = 1000ms / transmission time = 1000ms / 0.2ms = 5000 (Times)

Obviously, this extreme situation is too ideal. After all, the space of each track is limited. The seek time + rotation delay time can be reduced, but it cannot be completely avoided.

Disk reading and writing in four databases

1. Random access and continuous access

(1) Random access

It means that the sector address given by this IO is quite different from that given by the last IO, so that the head needs to move a lot between the two IO operations to start reading / writing data again.

(2) Sequential access

On the contrary, if the sector address given by the next IO is the same or close to the sector address at the end of the last IO, then the head can start this IO operation quickly. Such multiple IO operations are called continuous access.

(3) Take SQL Server database as an example

Data files, objects in the SQL Server unified area, are allocated in the unit of extent (8 * 8K). Data storage is very random. Any data page with space will be written in the same place. Unless each table is pre allocated with a large enough file for separate use through the file group, data continuity cannot be guaranteed, which is usually random access.
In addition, even a clustered index table is only logically continuous, not physically.

Log files, because of the existence of VLF (virtual log file), are read and write continuously in theory. However, if the log file is set to grow automatically and the increment is small, VLF will be many and very small, so it is not strictly continuous access.

2. Sequential IO and concurrent IO

(1) Sequential IO mode (queue mode)

The disk controller may issue a series of IO commands to the disk group at a time. If the disk group can only execute one IO command at a time, it is called sequential io;

(2) Concurrent IO mode (burst mode)

When a disk group can execute multiple IO commands at the same time, it is called concurrent io. Concurrent IO can only occur on a disk group composed of multiple disks, and a single disk can only process one IO command at a time.

(3) Take SQL Server database as an example

Sometimes, although the IOPs (disk transfers / sec) of the disk is not too large, IO wait occurs in the database. Why? Usually, there are too many IO requests piled up because of the disk request queue.

The request queue and busy level of the disk are viewed through the following performance counters:
LogicalDisk/ Avg.Disk Queue length (average length of queues in the processing queue)
LogicalDisk/Current Disk Queue Length
Logicaldisk /% disk time disk utilization

In this case, you can:
(1) Simplify business logic and reduce the number of IO requests;
(2) Multiple user databases under the same instance are migrated to different instances;
(3) The logs and data files of the same database are separated into different storage units;
(4) With the help of HA strategy, the separation of read and write operations is done.

3. IOPs and throughput

(1) IOPS

IOPs is the number of read / write (I / O) operations per second. When calculating the transfer time, it is mentioned that if the IO chunk size is large, the IOPs will be smaller. If the data is read and written in 100m, the IOPs will be smaller.

(2) Throughput (throughput)

Throughput is the number of bytes that can be read and written per second. It is also assumed that data is read and written in 100m. Although IOPs is very small, the throughput is not small when n * 100m data is read and written every second.

(3) Take SQL Server database as an example

For OLTP (on line transaction processing) systems, small pieces of data are often read and written, most of which are accessed randomly. IOPs is used to measure the performance of read and write;
For data warehouse, log files often read and write large pieces of data, most of which are accessed in sequence. Throughput is used to measure the read and write performance.

The current IOPs of the disk can be viewed through the following performance counters:
LogicalDisk/Disk Transfers/sec
LogicalDisk/Disk Reads/sec
LogicalDisk/Disk Writes/sec

The current throughput of the disk is viewed through the following performance counters:
LogicalDisk/Disk Bytes/sec
LogicalDisk/Disk Read Bytes/sec
LogicalDisk/Disk Write Bytes/sec