Record the problem points of MySQL production.
Business scenario and problem description
When requesting an external interface, the daily request volume is about 9 million.
It is divided into request item and receipt item. The request is used to call the external interface, and the receipt is the interface to receive and send.
The database is inserted before sending the request.
After the request, if the interface returns that the call failed, the database status will be updated to failed.
If the sending is successful, it will wait for the upstream to give a receipt message, and then update the database status.
During the production and operation, the problem of MQ consumer accumulation caused by MySQL twice in half a year.
Record the production problems caused by two different causes and cause analysis.
MySQL deadlock problem
View MQ aggregation platform TPS
It is found that MQ data has been accumulating and rising. The TPS is only about 30 and can’t go up all the time.
This will slow down MQ consumption and lead to continuous accumulation. What causes MQ to accumulate all the time? You need to continue troubleshooting.
View production server logs
Check the production server log and find the error dead lock.
error response from MySQLConnection [node=24, id=277499, threadId=2735941, state=borrowed, closed=false, autocommit=true, host=10.1.10.74, port=3306, database=sep_4, localPort=27744, isClose:false, toBeClose:false, MySQLVersion:5.7.25], err: Deadlock found when trying to get lock; try restarting transaction, code: 1213
The specific SQL is as follows:
update stage set status = 'success',reply_time = '2021-03-07 10:40:11' where code = '000123' and create_time > '2021-03-03 00:00:00';
That is, a deadlock occurs when the service is executed.
The specific number and time-consuming are not intuitive in the production server, so let the DBA check out the slow SQL statements and time-consuming.
It takes 7780ms to find the longest slow SQL.
If you look closely, you will find that the SQL server has the same ID, one in execution and one in lock wait status.
There are a large number of lock wait states in this slow SQL.
What causes the deadlock
The database engine used by MySQL is InnoDB. Let’s first understand what deadlock is:
Deadlock: refers to the execution of two or more processes,
A phenomenon of waiting for each other caused by competition for resources. Without external force, they will not be able to move forward
At this time, the system is said to be in a deadlock state or the system has a deadlock. These processes that are always waiting for each other are called deadlock processes
From the above troubleshooting, we can see that the deadlock problem is:
When executing SQL to update a piece of data, the row data will be locked. After execution, the row lock will be released, while the unexecuted SQL is in lock wait state.
The reason for this in the program is that the database is frequently operated before and after sending and receipt, and the same data may be operated at the same time.
Therefore, lock waiting occurs during execution.
There is no partition key for sub database and sub table
The first alarm is stage_ The CPU of prod library soared to 85%.
Is the number of database threads full
After checking the database connection, we can see that the number of database connections is not full.
Find out slow and time consuming
update stage set status = 'success',reply_time = '2021-03-07 10:40:11' where create_time > '2021-03-03 00:00:00';
Looking at the SQL, you will find that the SQL does not have a code field with fragment key. This SQL is executed when the receipt is received.
Check production server logs
Judgment is made in the code. If the code value is not empty, SQL will bring the code value. If you don’t bring it, you need to check why you don’t bring it.
Looking at the code, you will find that the code is obtained from redis and set to redis when sending. But it’s strange not to go in.
It was initially suspected that it was a redis problem, and then communicated with the platform maintained by redis. It was found that the problem was really caused by redis failure.
Why does CPU soar without partition key
First of all, the company uses hotdb sub database and sub table, because the daily warehousing volume is about 9 million, and a table contains hundreds of millions of data.
If you only use index, you can’t meet the requirements.
The database and table hotdb are divided into 64 hash slices according to the code value. In other words, 64 databases are distributed in 16 instances on 8 servers.
In this way, the uneven data of each partition can be avoided, and the excessive concentration on a partition can be avoided in theory.
For SQL without fragment key code, all DML operations are distributed to all underlying libraries for execution, which is equivalent to traversing the library.
This may cause the CPU to soar to 99% directly, or even cause the server to collapse directly. This operation is terrible.
terms of settlement
Emergency treatment: first stop several services to reduce database operation
The continuous accumulation of data will affect the data processing speed. Then, we must first reduce the operation speed. The fastest way is to stop the service and reduce the operation frequency of the database.
Reduce database operations and avoid database deadlock
Deadlock is usually not submitted in time because the submission of DML operation is not well controlled in the program
Reduce repeated operations on the same piece of data. During batch operation, reduce the number of DMLS in each batch to ensure fast submission, avoid long transactions and avoid repeated submission of DMLS.
So how to reduce operations?
Insert before sending and update when sending fails are directly merged into one SQL, which can avoid multiple operations on the same data.
Reduce the number of long transactions and in batch execution
During execution, it is found that executing 20 SQL statements in batch each time is faster than executing 200 SQL statements at one time.
So try to avoid this problem.
Each SQL must have a database table partition key
The principle is not to drag down the operation speed of the whole database because of one piece of data.
The partition key must be taken. If it is not taken, it will be thrown wrong.
Increase time interval opening and closing interval
Use code as the partition key and createtime as the partition. Then, under the condition of ensuring the existence of code, the opening and closing interval can be written, which can improve the execution efficiency.
Better solution: SQL executes sequentially
In this scheme, the SQL to be executed can be uniformly sent to an MQ for consumption and execution, which can ensure the sequential execution of SQL and avoid deadlock.
However, this needs to be distinguished according to business scenarios.
For MySQL deadlock, avoid frequent operation of the same data and long transactions as much as possible;
For the problem of sub database and sub table, you must bring the partition key;
Monitoring mechanism is indispensable;
This is the end of this article on MySQL deadlock and sub database and sub table. For more information about MySQL deadlock and sub database and sub table, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!