Technology sharing | using binlog event to find problems

Time:2020-10-16

Author: Gao Peng (eight monsters)
At the end of the article, there is his book “deep understanding of MySQL master-slave principle 32 lectures”, in-depth and thorough understanding of MySQL master-slave, gtid related technical knowledge.

With the previous understanding of events, we can use these events to complete some work. After learning these common events, I used C language to write a tool to analyze event. I called it “infobin”, which means to extract information from binary log. As far as I know, although this tool will appear bugs in a few cases, some friends are still using it. I am not here to promote my tools, but to tell you this idea. I completed this tool based on some problems encountered in my work. The main functions are as follows:

  • Analyze whether there are long-term uncommitted transactions in the binary log. The long-term uncommitted transactions will cause more lock contention.
  • Analyze whether there are large transactions in the binary log. The commit of large transactions may block the commit of other transactions.
  • Analyze the number of DML events generated by each table in the binary log, so as to know which table has the largest amount of modification.
  • Analyze the generation speed of events in the binary log, so as to know which time period generates more events.

Here is the address of this tool for your reference:
https://github.com/gaopengcar…

Help information for this tool is as follows:

[[email protected] infobin]# ./infobin
USAGE ERROR!
[Author]: gaopeng [QQ]:22389860[blog]:http://blog.itpub.net/7728585/
--USAGE:./infobin binlogfile pieces bigtrxsize bigtrxtime [-t] [-force]
[binlogfile]:binlog file!
[piece]:how many piece will split,is a Highly balanced histogram,
find which time generate biggest binlog.(must:piece<2000000)
[bigtrxsize](bytes):larger than this size trx will view.(must:trx>256(bytes))
[bigtrxtime](sec):larger than this sec trx will view.(must:>0(sec))
[[-t]]:if[-t] no detail isprintout,the result will small
[[-force]]:force analyze if unkown error check!!

Next, let’s take a look at how these functions are implemented.

1、 Analyze long uncommitted transactions

As I have mentioned many times before, for a manually committed transaction, it has the following characteristics. We use the ‘Insert’ statement as the column:

  1. GTID_ LOG_ Event and XID_ Event is the time when the command ‘commit’ was initiated.
  2. QUERY_ Event is the time when the first ‘Insert’ command is initiated.
  3. MAP_ EVENT/WRITE_ ROWS_ Event is the time when each ‘Insert’ command is initiated.

In fact, we can use (1) minus (2) to get the time between the first DML command and the commit command. Then we can use a user input parameter to define how long a transaction has not been committed, which is called a long-term uncommitted transaction. In my tool, bigtrxtime is used as the input. Let’s use an example to illustrate that we make the following statements:

Technology sharing | using binlog event to find problems

Let’s take a look at the sequence and timing of events as follows:

Technology sharing | using binlog event to find problems

If we use the last XID_ Time of event minus query_ Event time, then the time from the beginning of the first statement to “commit” is calculated. Note that the “begin” command is not actually recorded in the event. It just makes a mark to prevent the transaction from automatically entering the commit process.

For what the “begin” command does, you can refer to my brief book as follows:
https://www.jianshu.com/p/6de…

2、 Analyze big business

This part of the implementation is relatively simple, we only need to scan each transaction gtid_ LOG_ Event and XID_ Calculate the sum of all events between events to get the size of the event generated by each transaction (but for compatibility, it is better to calculate query_ Event and XID_ Total number of events between events). Then use a user input parameter to define how big a transaction is called an event. In my tool, bigtrxsize is used as the input parameter.
If the parameter binlog_ row_ If the image parameter is set to ‘full’, we can roughly calculate the size of the log generated by modifying each row of data in a specific table

  • “Insert” and “delete”: because only before_ Image or after_ Image, so 100 bytes of data per row, plus some extra overhead, is about 10 bytes, or 110 bytes per row. If the transaction is positioned at 100MW, then if the transaction is modified for a large amount of data.
  • “Update”: because it contains before_ Image and after_ Image, so the 110 bytes of the above calculation need to be multiplied by 2. Therefore, if the positioning large transaction is 100m, the modification amount is about 50W rows of data.

I think 20m is more appropriate as the definition of large transactions. Of course, this is calculated according to your own requirements.

3、 Analysis of event generation speed in binary log

This implementation is very simple. We only need to partition the binary log according to the input parameters, and calculate the time difference between the end event and the start event, and then we can roughly calculate how long each fragment generation took. Our tool uses piece as the input parameter of fragmentation.
Through this analysis, we can roughly know which period of time the amount of event generation is larger, which also reflects the busy degree of the database.

4、 Analyze how many DML events are generated per table

This function is also very practical. Through this analysis, we can know which table in the database has the largest amount of modification. The main implementation method is to scan the map in the binary log_ Event and the next DML event, get the table name through table ID, and then put the size of DML event into this table, make a linked list, and finally sort the output.
But as we said earlier, table ID may change even in a transaction, which I didn’t consider at the beginning. Therefore, there are some problems with this tool, but most of the time it can run normally.

5、 Tool display

Now I will show the functions I mentioned. I have done the following operations:

mysql> flush binary logs;
Query OK, 0 rows affected (0.51 sec)

mysql> select count(*) from tti;
+----------+
| count(*) |
+----------+
| 98304|
+----------+
1 row inset(0.06 sec)

mysql> deletefrom tti;
Query OK, 98304 rows affected (2.47 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tti values(1,'gaopeng');
Query OK, 1 row affected (0.00 sec)

mysql> select sleep(20);
+-----------+
| sleep(20) |
+-----------+
| 0|
+-----------+
1 row inset(20.03 sec)

mysql> commit;
Query OK, 0 rows affected (0.22 sec)

mysql> insert into tpp values(10);
Query OK, 1 row affected (0.14 sec)

In the example, I switch a binary log and do three transactions at the same time:

  • A total of 98304 rows of TTI table data are deleted.
  • Insert a piece of data into the TTI table and wait for more than 20 seconds to submit.
  • A piece of data is inserted into the TPP table.

We use the tool to analyze it. The following is the statistical output:

./infobin mysql-bin.0000053100000015-t > log.log
more log.log

...
-------------Total now--------------
Trx total[counts]:3
Event total[counts]:125
Max trx event size:8207(bytes) Pos:420[0X1A4]
Avg binlog size(/sec):9265.844(bytes)[9.049(kb)]
Avg binlog size(/min):555950.625(bytes)[542.921(kb)]
--Piece view:
(1)Time:1561442359-1561442383(24(s)) piece:296507(bytes)[289.558(kb)]
(2)Time:1561442383-1561442383(0(s)) piece:296507(bytes)[289.558(kb)]
(3)Time:1561442383-1561442455(72(s)) piece:296507(bytes)[289.558(kb)]
--Large than 500000(bytes) trx:
(1)Trx_size:888703(bytes)[867.874(kb)] trx_begin_p:299[0X12B]
trx_end_p:889002[0XD90AA]
Total large trx count size(kb):#867.874(kb)
--Large than 15(secs) trx:
(1)Trx_sec:31(sec) trx_begin_time:[2019062514:00:08(CST)]
trx_end_time:[2019062514:00:39(CST)] trx_begin_pos:889067
trx_end_pos:889267 query_exe_time:0
--EveryTable binlog size(bytes) and times:
Note:size unit is bytes
---(1)CurrentTable:test.tpp::
Insert:binlog size(40(Bytes)) times(1)
Update:binlog size(0(Bytes)) times(0)
Delete:binlog size(0(Bytes)) times(0)
Total:binlog size(40(Bytes)) times(1)
---(2)CurrentTable:test.tti::
Insert:binlog size(48(Bytes)) times(1)
Update:binlog size(0(Bytes)) times(0)
Delete:binlog size(888551(Bytes)) times(109)
Total:binlog size(888599(Bytes)) times(110)
---Total binlog dml event size:888639(Bytes) times(111)

We found that all the operations we did were counted out

  • It contains a large transaction log with a total volume of more than 500K and a size of about 800K. This is caused by the deletion of 98304 rows of data in TTI table.
--Large than 500000(bytes) trx:
(1)Trx_size:888703(bytes)[867.874(kb)] trx_begin_p:299[0X12B]
trx_end_p:889002[0XD90AA]
  • It contains a transaction that has not been committed for a long time, and the time is 31 seconds. This is caused by my deliberately waiting for more than 20 seconds to commit.
--Large than 15(secs) trx:
(1)Trx_sec:31(sec) trx_begin_time:[2019062514:00:08(CST)]
trx_end_time:[2019062514:00:39(CST)] trx_begin_pos:889067
trx_end_pos:889267 query_exe_time:0
  • This binary log has two table modification records TTI and TPP. The TTI table has a ‘Delete’ operation and a ‘Insert’ operation, while the TPP table only has an ‘Insert’ operation and contains the size of the log volume.
--EveryTable binlog size(bytes) and times:
Note:size unit is bytes
---(1)CurrentTable:test.tpp::
Insert:binlog size(40(Bytes)) times(1)
Update:binlog size(0(Bytes)) times(0)
Delete:binlog size(0(Bytes)) times(0)
Total:binlog size(40(Bytes)) times(1)
---(2)CurrentTable:test.tti::
Insert:binlog size(48(Bytes)) times(1)
Update:binlog size(0(Bytes)) times(0)
Delete:binlog size(888551(Bytes)) times(109)
Total:binlog size(888599(Bytes)) times(110)
---Total binlog dml event size:888639(Bytes) times(111)

Well, what I want to tell you is that after learning event, you can try to parse binary log in various languages. Maybe you can write better tools to realize more functions.
Of course, it can also be parsed through mysqlbinlog and counted by shell / python, but the speed of this tool is much faster than this way.

Finally, Gao Peng’s column “deep understanding of MySQL master-slave principle 32 lectures” is recommended. Those who want to thoroughly understand the master-slave principle of MySQL should not be missed.