Transferred from the official account @twt community, author Chen Chihui
3 I / O monitoring
3.1 IO response time evaluation
What kind of IO response time is reasonable? The following is a summary of some empirical rules:
- For the magnetic array with mechanical hard disk and without storage synchronous mirror, the empirical rule for evaluating random IO response time is proposed
- Experience rule for evaluating random IO response time when configuring synchronous mirror
- If SSD storage is used
- For sequential IO, we should pay more attention to throughput than IO service time;
3.2 fast locating busy disks through nmon
Enter the diskbusy page of nmon report and observe the value of wavg. If wavg is at 90 % Above, there may be disk hotspots, and it is necessary to focus on monitoring related disks.
Note: AVG shows the average value of the whole monitoring process (including the period when the disk is fully idle); While wavg is the average value displayed during the monitoring period when the disk is busy; Because nmon data acquisition cycle is often much longer than the peak time of service, wavg is generally more meaningful than AVG.
three . 3 through SAR / The iostat command monitors busy disks
Busy disks can be monitored by SAR – D or iostat – D as follows, where the response time is in milliseconds. Generally, if the average response time of reading is more than 15 ms and that of writing is more than 2.5 ms, we need to pay attention to it.
If the queue time and sqfull values are not empty for a long time, it is necessary to determine whether the queue depth is set too small (queue depth)_ depth)。
Note: to facilitate script analysis, it is generally recommended to set – D option, plus – L (lowercase L) and – T option. So the output for each hdisk will be displayed on the same line.
three . 4 monitor the fiber card by fcstat command
Fcstat can be used to observe the support rate and operation rate of the fiber card, for example:
`# fcstat fcs0|grep -i speed
Port Speed (supported): 8 GBIT
Port Speed (running): 8 GBIT`
If the running rate is lower than the actual supported rate, it is necessary to check whether the link state between the switch and the host is normal.
If the following two indicators continue to grow (note that the value must be non-zero, focusing on the growth rate), you need to adjust the max value of the fiber card accordingly_ xfer_ Size and num_ cmd_ elems:
Or use fcstat – d to judge, num_ cmd_ The value of elems should be greater than or equal to < high water mark of active commands > + < high water mark of pending commands >. For example, in the following example, Num can be set_ cmd_ Elems is 180 + 91 = 271
3.5 using FILEMON to monitor IO reading and writing
You can use FILEMON to monitor LF (file system), LV (logical volume), PV (physical volume) and VM (virtual memory management) information, as follows:
# filemon -T 1000000 -u -O lf,lv,pv,detailed -o fmon.out
# sleep 5
The generated FILEMON report is output in fmon . Out inside.
Note: if XXX events lost appears in the report, it indicates that trace buffer overflow has occurred, and trace buffer can be increased appropriately (by – T), or shorten the monitoring cycle (the interval from FILEMON to trcstop).
3.6 read the FILEMON Report
You can get the busiest file, logical volume and physical volume information through FILEMON report, as follows:
You can also get the read / write status and response time of different files, logical volumes and physical volumes from the detailed report of FILEMON
The percentage of sees actually indicates the mode of Io. If the percentage of sees is close to 100%, then IO is random. Conversely, if seeks is close to 0, then IO is sequential.
4 network monitoring
4.1 monitoring network rate
The entstat – dentx command can be used to monitor the network rate and the sending and receiving of packets, such as the following scenarios:
# entstat -d ent0|grep -i speed
Media Speed Selected: Autonegotiate
Media Speed Running: 100 Mbps, Full Duplex
External-Network-Switch (ENS) Port Speed: 100 Mbps, Full Duplex
The running speed of the network shown is 100Mbps; if the network bandwidth exceeds 12.5mbps in the actual test, it indicates that the network may be a performance bottleneck.
four . 2 monitoring network response time
Ping command is mainly used to check the network connectivity. From the result of Ping, we can check the network quality, packet loss rate and so on. The time value of Ping response can be used to judge the direct network transmission delay of two hosts. The time value between LAN servers (most of them are 10 Gigabit card optical fiber connection) should be less than 1 ms .
A script is provided to evaluate the network latency between two hosts as follows:
4.3 monitoring network card status
At the same time, the entstat – D command can also monitor the traffic distribution status of the EtherChannel network card (such as the distribution of receiving and sending packets and receiving and sending bandwidth), as well as the 802 . 3aD link aggregation status, for example, the following example shows an 802 . Network card status of 3aD aggregation success:
4.4 monitoring network connection status
Netstat is the most commonly used tool for statistical observation of network operation. Netstat has many parameters, the main ones are – in / – an / wait. use – When selecting the in option, you need to pay attention to the ierrs and oers columns. Ierrs is the number of packets that failed to receive and oers is the number of packets that failed to send. Check the ierrs / Ipkts over 1 % Or oers / Opkts over 1 % At this time, it may be necessary to check whether the network is unstable.
use – When selecting the an option, pay attention to recv – Q、Send – Q and state. Recv – Q is the queuing condition of the receiving network card queue. Send – Q is the queuing condition of the network card sending queue. State indicates the state of the network connection, which is generally listen or establish. When the connection is in last for a long time_ ACK、FIN_ The status of wait indicates that the related TCP connection status is poor. If the TCP connection is used by the application, attention should be paid.
four . 5 check the retransmission rate of packets in the network
Netstat – s provides TCP related statistics, including retransmission statistics. TCP retransmission will trigger congestion avoidance algorithm, resulting in the network bandwidth can not be effectively utilized, resulting in a significant decline in performance. Especially retransmit timeouts. By default, this kind of retransmission timeout usually takes about 1.5 seconds, which has a more serious impact on performance.
Refer to the following netstat statistical output. Generally, if the retransmission rate exceeds one in ten thousand, we need to comprehensively analyze the causes of packet loss from the local computer, the opposite end, and the network side (including switches, firewalls, etc.), and generally need to confirm through packet capture (iptrace and tcpdump are commonly used packet capture tools on AIX).
4.6 monitoring network read and write through netpmon
Initiate FTP transmission from aixdemo2 host to aixdemo1 host
Start netpmon on aixdemo1 to observe:
From the output of netpmon, we can get the sort of TCP calls of each process and the detailed decomposition
5 automatic performance data collection
1. topasout – – a <*.topas>
2. nmon_analyzer <*.topas.csv>
6 perfpmr data collection
To download the perfpmr installation package:
Select the appropriate perpmr package according to the operating system version,
To install the perfpmr package:
- Log in as root and upload the perfpmr installation package as bin.
- Create decompression directory
# mkdir /tmp/perf71
# cd /tmp/perf71
3. Decompress the perfpmr installation package at / TMP / perf71
# zcat perf71.tar.Z | tar -xvf -
Install. / install
- Create data collection directory
# mkdir /tmp/perfdata
# cd /tmp/perfdata
- Run the data collection command – this command takes 5-10 minutes to run. It is necessary to ensure that the performance test is in a stable running state during the operation of the command. ‘ perfpmr.sh 60’
- Taking data will / tmp / The data in perfdata can be packaged and retrieved; Perfpmr is recommended . SH direct packaging (optimal compression ratio): in the directory above the performance data, run the following command:
#perfpmr.sh -o perfdata -z perfdata_<TPS_VALUE>_<GOOD_OR_BAD>.pax.gz