How to resolve abnormal data?


When a customer database (Oracle did not make any changes, the business was frequently stuck every weekend morning. After careful diagnosis, it was preliminarily determined that tuxedo parameters reached the threshold due to the surge of business volume. After optimizing tuxedo parameters on May 10, the business interruption did not occur.

A week later, Caton appeared again. The first time the capture phenomenon was deeply diagnosed. Combined with the database AWR performance report and the slow NBU backup, it was found that the storage read rate was only one fourth of the original (50-80m / s), and the performance decreased significantly. Finally, the abnormal positioning link led to the decline of the whole storage performance, and the foreground responded to the Caton delay. The specific analysis process is as follows:

Combined with the use of database and operating system resources, it can be seen that the page tour database has been busy during the failure period. From the database waiting events and performance analysis report, the database is being backed up during the peak period, which consumes a lot of I / O resources, the disk utilization of the operating system is also very high, and some I / O waiting occurs in the CPU.

The above phenomena can be basically determined that the response of the operating system I / O during the failure period can not meet the current database requirements, resulting in serious I / waiting in the database, which indirectly affects the extension of NBU backup time.

To prove this, check the database I / O performance indicators as follows:

Check the database I / O performance indicators during the failure period. Under normal circumstances, the response of the database to I / O is required to be within 10ms, and the current indicators far exceed this value. When the disk utilization of the previous operating system is close to 100%, the output of the whole I / O is only 39-80m / s, which is very obvious.

Further, through DD and disk file copy tests, it is also found that the disk I / O read rate is only one quarter of the original 50-80m / s. at this time, it can be concluded that the link from the host to the storage or the storage itself is abnormal.