AIX performance management and monitoring suggestions (2)

Time:2021-4-21

Transferred from the official account @twt community, author Chen Chihui

3 I / O monitoring

3.1 IO response time evaluation

What kind of IO response time is reasonable? The following is a summary of some empirical rules:

  • For the magnetic array with mechanical hard disk and without storage synchronous mirror, the empirical rule for evaluating random IO response time is proposed

AIX performance management and monitoring suggestions (2)

  • Experience rule for evaluating random IO response time when configuring synchronous mirror

AIX performance management and monitoring suggestions (2)

  • If SSD storage is used

AIX performance management and monitoring suggestions (2)

  • For sequential IO, we should pay more attention to throughput than IO service time;

3.2 fast locating busy disks through nmon

Enter the diskbusy page of nmon report and observe the value of wavg. If wavg is at 90 % Above, there may be disk hotspots, and it is necessary to focus on monitoring related disks.

Note: AVG shows the average value of the whole monitoring process (including the period when the disk is fully idle); While wavg is the average value displayed during the monitoring period when the disk is busy; Because nmon data acquisition cycle is often much longer than the peak time of service, wavg is generally more meaningful than AVG.

As follows:

AIX performance management and monitoring suggestions (2)

three . 3 through SAR / The iostat command monitors busy disks

Busy disks can be monitored by SAR – D or iostat – D as follows, where the response time is in milliseconds. Generally, if the average response time of reading is more than 15 ms and that of writing is more than 2.5 ms, we need to pay attention to it.

If the queue time and sqfull values are not empty for a long time, it is necessary to determine whether the queue depth is set too small (queue depth)_ depth)。

AIX performance management and monitoring suggestions (2)

Note: to facilitate script analysis, it is generally recommended to set – D option, plus – L (lowercase L) and – T option. So the output for each hdisk will be displayed on the same line.

three . 4 monitor the fiber card by fcstat command

Fcstat can be used to observe the support rate and operation rate of the fiber card, for example:

`# fcstat fcs0|grep -i speed

Port Speed (supported): 8 GBIT

Port Speed (running): 8 GBIT`

If the running rate is lower than the actual supported rate, it is necessary to check whether the link state between the switch and the host is normal.

If the following two indicators continue to grow (note that the value must be non-zero, focusing on the growth rate), you need to adjust the max value of the fiber card accordingly_ xfer_ Size and num_ cmd_ elems:

AIX performance management and monitoring suggestions (2)

Or use fcstat – d to judge, num_ cmd_ The value of elems should be greater than or equal to < high water mark of active commands > + < high water mark of pending commands >. For example, in the following example, Num can be set_ cmd_ Elems is 180 + 91 = 271

AIX performance management and monitoring suggestions (2)

3.5 using FILEMON to monitor IO reading and writing

You can use FILEMON to monitor LF (file system), LV (logical volume), PV (physical volume) and VM (virtual memory management) information, as follows:

# filemon -T 1000000 -u -O lf,lv,pv,detailed -o fmon.out

# sleep 5

# trcstop

The generated FILEMON report is output in fmon . Out inside.

Note: if XXX events lost appears in the report, it indicates that trace buffer overflow has occurred, and trace buffer can be increased appropriately (by – T), or shorten the monitoring cycle (the interval from FILEMON to trcstop).

3.6 read the FILEMON Report

You can get the busiest file, logical volume and physical volume information through FILEMON report, as follows:

AIX performance management and monitoring suggestions (2)

You can also get the read / write status and response time of different files, logical volumes and physical volumes from the detailed report of FILEMON

AIX performance management and monitoring suggestions (2)

AIX performance management and monitoring suggestions (2)

The percentage of sees actually indicates the mode of Io. If the percentage of sees is close to 100%, then IO is random. Conversely, if seeks is close to 0, then IO is sequential.

4 network monitoring

4.1 monitoring network rate

The entstat – dentx command can be used to monitor the network rate and the sending and receiving of packets, such as the following scenarios:

# entstat -d ent0|grep -i speed

Media Speed Selected: Autonegotiate

Media Speed Running: 100 Mbps, Full Duplex

External-Network-Switch (ENS) Port Speed: 100 Mbps, Full Duplex

The running speed of the network shown is 100Mbps; if the network bandwidth exceeds 12.5mbps in the actual test, it indicates that the network may be a performance bottleneck.

four . 2 monitoring network response time

Ping command is mainly used to check the network connectivity. From the result of Ping, we can check the network quality, packet loss rate and so on. The time value of Ping response can be used to judge the direct network transmission delay of two hosts. The time value between LAN servers (most of them are 10 Gigabit card optical fiber connection) should be less than 1 ms .

A script is provided to evaluate the network latency between two hosts as follows:

AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)

4.3 monitoring network card status

At the same time, the entstat – D command can also monitor the traffic distribution status of the EtherChannel network card (such as the distribution of receiving and sending packets and receiving and sending bandwidth), as well as the 802 . 3aD link aggregation status, for example, the following example shows an 802 . Network card status of 3aD aggregation success:

AIX performance management and monitoring suggestions (2)
AIX performance management and monitoring suggestions (2)

4.4 monitoring network connection status

Netstat is the most commonly used tool for statistical observation of network operation. Netstat has many parameters, the main ones are – in / – an / wait. use – When selecting the in option, you need to pay attention to the ierrs and oers columns. Ierrs is the number of packets that failed to receive and oers is the number of packets that failed to send. Check the ierrs / Ipkts over 1 % Or oers / Opkts over 1 % At this time, it may be necessary to check whether the network is unstable.

use – When selecting the an option, pay attention to recv – Q、Send – Q and state. Recv – Q is the queuing condition of the receiving network card queue. Send – Q is the queuing condition of the network card sending queue. State indicates the state of the network connection, which is generally listen or establish. When the connection is in last for a long time_ ACK、FIN_ The status of wait indicates that the related TCP connection status is poor. If the TCP connection is used by the application, attention should be paid.

AIX performance management and monitoring suggestions (2)

AIX performance management and monitoring suggestions (2)

four . 5 check the retransmission rate of packets in the network

Netstat – s provides TCP related statistics, including retransmission statistics. TCP retransmission will trigger congestion avoidance algorithm, resulting in the network bandwidth can not be effectively utilized, resulting in a significant decline in performance. Especially retransmit timeouts. By default, this kind of retransmission timeout usually takes about 1.5 seconds, which has a more serious impact on performance.

Refer to the following netstat statistical output. Generally, if the retransmission rate exceeds one in ten thousand, we need to comprehensively analyze the causes of packet loss from the local computer, the opposite end, and the network side (including switches, firewalls, etc.), and generally need to confirm through packet capture (iptrace and tcpdump are commonly used packet capture tools on AIX).

AIX performance management and monitoring suggestions (2)

4.6 monitoring network read and write through netpmon

Initiate FTP transmission from aixdemo2 host to aixdemo1 host

AIX performance management and monitoring suggestions (2)

Start netpmon on aixdemo1 to observe:

AIX performance management and monitoring suggestions (2)

AIX performance management and monitoring suggestions (2)

AIX performance management and monitoring suggestions (2)

From the output of netpmon, we can get the sort of TCP calls of each process and the detailed decomposition

AIX performance management and monitoring suggestions (2)

AIX performance management and monitoring suggestions (2)

5 automatic performance data collection

AIX performance management and monitoring suggestions (2)

1. topasout – – a <*.topas>

AIX performance management and monitoring suggestions (2)

2. nmon_analyzer <*.topas.csv>

AIX performance management and monitoring suggestions (2)

6 perfpmr data collection

To download the perfpmr installation package:

Select the appropriate perpmr package according to the operating system version,

ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/

To install the perfpmr package:

  1. Log in as root and upload the perfpmr installation package as bin.
  2. Create decompression directory

# mkdir /tmp/perf71

# cd /tmp/perf71

3. Decompress the perfpmr installation package at / TMP / perf71

# zcat perf71.tar.Z | tar -xvf -

Install. / install

Data collection:

  1. Create data collection directory

# mkdir /tmp/perfdata

# cd /tmp/perfdata

  1. Run the data collection command – this command takes 5-10 minutes to run. It is necessary to ensure that the performance test is in a stable running state during the operation of the command. ‘ perfpmr.sh 60’
  2. Taking data will / tmp / The data in perfdata can be packaged and retrieved; Perfpmr is recommended . SH direct packaging (optimal compression ratio): in the directory above the performance data, run the following command:

#perfpmr.sh -o perfdata -z perfdata_<TPS_VALUE>_<GOOD_OR_BAD>.pax.gz

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]