Notes on actual combat on mysql45 (9-16)

Time:2022-5-17

9. Unique index, ordinary index and changebuffer

There is no difference in query capability between the two types of indexes. The main consideration is the impact on update performance:
When updating the unique index (such as inserting), you need to find the location, judge the conflict, and then update it. It is one step more than the ordinary index to judge the conflict. Although it only takes a small CPU time, if the target page of the record to be updated is not in memory:

  • aboutunique indexFor example:Data pages need to be read into memory, if it is judged that there is no conflict, insert this value and the execution of the statement ends;
  • For ordinary indexes, the update is recorded in the change buffer, and the statement execution ends

For a unique index, all update operations must first determine whether the operation violates the uniqueness constraint. Therefore,The change buffer cannot be used for the update of the unique index

Since the unique index does not refer to the optimization mechanism of change buffer, if the business is acceptable, I will build it from the perspective of performance
Discuss youGive priority to non unique indexes, if the business cannot be guaranteed, or the business requires the database to be constrained, there is no choice but to create a unique index.
In addition, in some “archive library” scenarios, you can consider using a unique index. For example, online data only needs to be retained for half a year, and then the historical data is saved in the archive library. At this time, archiving data is to ensure that there is no single key conflict. To improve the efficiency of archiving, we can consider changing the unique index in the table into an ordinary index

Changebuffer principle

Reading data from disk to memory involves random IO access, which is one of the most expensive operations in the database. Because the change buffer reduces random disk access, it will significantly improve the update performance.

The process of applying the operations in the change buffer to the original data page to get the latest results is calledmergeMerge is the time when the data is really updatedThe main purpose of the change buffer is to cache the recorded changes. Therefore, before a data page is merged, the more changes recorded in the change buffer, the greater the revenue.

The execution process of merge is as follows:

  1. Read data pages from disk to memory (old version of data pages);
  2. Find out the change buffer records of this data page from the change buffer (there may be multiple records), and apply them in turn to obtain new records
    Version data page;
  3. Write redo log. This redo log contains data changes and change buffer changes.
    Here, the merge process is over. At this time, the disk position corresponding to the data page and the change buffer in memory has not been repaired
    Change is a dirty page, and then brush back their own physical data, which is another process

Therefore, forWrite more and read lessFor business, the probability that the page will be accessed immediately after writing is relatively small. At this time, change
Buffer works best. This kind of business model is commonly used in billing and log systems.

Conversely, assuming that the update mode of a business is to query immediately after writing, even if the conditions are met, the update will be recorded in the change buffer first, but then the merge process will be triggered immediately because the data page will be accessed immediately. Such random access
The number of IO will not be reduced, but the maintenance cost of change buffer will be increased. Therefore, for this business model
In other words, the change buffer has a side effect.

Q1If a write uses the change buffer mechanism and the host restarts abnormally, will the change buffer and data be lost?
It won’t be lost. When we commit the transactionThe change buffer operation is also recorded in the redo logSo when the crash recovers, change
The buffer can also be found

Changebuffer and redo log

The core mechanism to improve the performance of change buffer and WAL’s redolog is to minimize random reads and writes.

  • Change buffer mainly saves the IO consumption of random disk reading (if it is in memory, it will be returned directly, and the data page will not be read into memory until it really needs to be read, and a correct result will be obtained after correction according to change buffer)
  • Redo log is mainly used to save the IO of random disk writing. The wal idea is to write the log first and then write the data page. The log is written in sequence, and multiple pieces can be combined and written in one piece

10. Why does MySQL sometimes choose the wrong index? How to deal with it?

The purpose of the optimizer selecting the index is to find an optimal execution scheme and execute the statement at the least cost. In the database,Number of scan linesIs one of the factors that affect the execution cost. The optimizer will make a comprehensive judgment based on whether to use temporary tables, whether to sort and other factors.

The question is: how to judge the number of scanning lines?
MySQL cannot know exactly how many records meet this condition before it actually starts executing statements, but only the root
Estimate the number of records based on statistical information. This statistical information is the “discrimination” of the index. Obviously, the more different values on an index, the better the discrimination of the index. The number of different values in an index is calledCardinality。 In other words, the larger the cardinality, the better the discrimination of the index.
MySQL calculates the cardinality through sampling statistics. When sampling statistics, InnoDB will select n data pages by default to count the different values on these pages to get an average
Value, and then multiplied by the number of pages of the index to get the cardinality of the index. The data table will be updated continuously, and the index statistics will not be fixed. Therefore, when the number of changed data rows exceeds 1 / m, it will automatically trigger a new index statistics.
By setting the parameter InnoDB_ stats_ Select from the value of persistent:

  • When set to on, it means that the statistical information will be stored persistently. At this time, the default n is 20 and M is 10.
  • When set to off, it means that the statistical information is only stored in memory. At this time, the default n is 8 and M is 16.

MySQL error index

  • Case 1 The optimizer will calculate the cost of using the index back to the ordinary table
  • Case 2 When order by, the index that avoids sorting will be selected to reduce the cost
  • situation…

Solutions and troubleshooting methods:
Show index check the index cardinality, check whether the statistical information is inaccurate, explain check the index usage, and estimate whether the number of rows scanned is normal

For problems caused by inaccurate index statistics, you can useanalyze tableTo solve it.
For the misjudgment of other optimizers, you can use it on the application sideForce index to forcibly specify the index, you can also modify
Statement to guide the optimizer, and you can also bypass this problem by adding or deleting indexes.

11. How to quote a string field?

  1. Create a complete index directly, which may take up more space;
  2. Create a prefix index (the first part of the continuous bytes are used as the index) to save space, but it will increase the number of query scans, andCannot use overlay index
  3. Store in reverse order, and then create a prefix index to bypass the problem of insufficient prefix discrimination of the string itself;
  4. Creating a hash field index has stable query performance and additional storage and calculation consumption. It is the same as the third methodRange scanning is not supported
    In practical application, you should choose which method to use according to the characteristics of business fields.

In addition, the prefix index email (18). At this time, although index2 already contains all the information, InnoDB still needs to go back to the ID index and check again, because the system is not sure whether the definition of the prefix index truncates the complete message
Rest.

12. Why does my MySQL “shake”?

When InnoDB processes the update statement, it only does the disk operation of writing log. When the contents of the memory data page and the disk data page are inconsistent, we call this memory page “dirty page”. After the memory data is written to the disk, the contents of the data page in the memory and on the disk are consistent, which is called “clean page”. Both dirty and clean pages are in memory.
In fact, a quick update operation is actually writing memory and logs. The moment MySQL occasionally “shakes”, it may be flushing:

  • InnoDB’s redo log is full. At this time, the system will stop all update operations, push forward the checkpoint, and leave space for redo log to continue writing. This situation should be avoided by InnoDB. Because at this time, the whole system can no longer accept updates, and all updates must be blocked.

  • The system is out of memory. When a new memory page is needed and the memory is not enough, some data pages should be eliminated to free up memory for other data pages. If “dirty pages” are eliminated, the dirty pages must be written to disk first.
    If there are too many dirty pages to be eliminated in a query, the response time of the query will be significantly longer;

  • When MySQL thinks the system is “idle”. Brush a little “dirty pages” whenever you have a chance

  • MySQL shuts down normally. MySQL will flush the dirty pages of memory to the disk, so that the next time MySQL starts, you can directly read data from the disk, and the startup speed will be very fast

Control strategy of InnoDB dirty page brushing

There are two influencing factors of dirty page brushing speed:
One isDirty page ratio, one isRedo log write speed

Parameter InnoDB_ max_ dirty_ pages_ PCT is the upper limit of dirty page proportion. The default value is 75%,
The speed of fully brushing dirty pages is related to disk IOPs: InnoDB_ io_ Capacity parameter.
The current dirty page ratio (assumed to be m) and the difference between the currently written log serial number and the serial number corresponding to the checkpoint (assumed to be n), the system will get F1 (m) and F2 (n) through a series of complex calculations, and take the larger value as R, and then the engine can follow:
innodb_ io_ The capacity defined by capacity is multiplied by R% to control the speed of dirty pages

innodb_ flush_ Neighbors parameter andContinuous sitting mechanism
Once a query request needs to flush out a dirty page during execution, the query may be slower than usual. A mechanism in MySQL may make the query slower: when preparing to brush a dirty page, if the data page next to the data page happens to be a dirty page, it will brush the “neighbor” with it; Moreover, this logic can continue to spread, that is, for each neighbor data page, if the adjacent data page is still dirty, it will also be brushed together.
innodb_ flush_ When the neighbors parameter value is 1, there will be the above “continuous sitting” mechanism. When the value is 0, it means that you don’t find neighbors and brush your own.

13. Why is the table file size unchanged when half of the table data is deleted?

If you want to shrink a table and just delete the unused data in the table, the size of the table file is
It won’t change. You have to rebuild the table through the alter table command to achieve the purpose of reducing the table file.
Reason: the delete command only marks the location of the record or the data page as“Reusable”However, no space is reclaimed, but the size of the disk file will not change.
Not only will deleting data cause holes, but also inserting data. If the data is inserted randomly, it may create holes
The indexed data page is split. After the page split is completed, a hole will be left at the end of the page.
That is, afterA large number of tables added, deleted and modified may be empty

How to properly shrink space:
Use the alter table a engine = InnoDB command toRebuild table, MySQL will automatically
Automatically complete the operations of transferring data, exchanging table names and deleting old tables. However, during the whole DDL process, table a cannot be updated. In other words, this DDL is not online.
Online DDL introduced in MySQL version 5.6:
Process of rebuilding table:

Notes on actual combat on mysql45 (9-16)

image.png
  1. Create a temporary file and scan all data pages of the primary key of table a;
  2. Generate a B + tree from the records of table a in the data page and store it in a temporary file;
  3. During the generation of temporary files, all operations on a are recorded in a log file (row log). 3 In the process of generating temporary files, all operations on a are recorded in a log file (row log), which corresponds to the state of State2 in the figure;
  4. After the temporary file is generated, the operations in the log file are applied to the temporary file to obtain a data file with the same logical data as table a, corresponding to the state of state3 in the figure;
  5. Replace the data file of table a with a temporary file.

The alter statement needs to obtain the MDL write lock when it is started, but the write lock is locked before the data is actually copiedDegenerate into read lockYes.
Why degenerate? In order to realize online, MDL read lock will not block addition, deletion and modification operations.
Then why not just unlock it directly? To protect yourself,Prohibit other threads from doing DDL on this table at the same time
Therefore, compared with the whole DDL process, the lock time is very short. For business, it can be regarded as online.

The statement “alter table t engine = InnoDB” in the table reconstruction actually implies:
alter table t engine=innodb,ALGORITHM=inplace;
For the server layer, not moving the data to the temporary table is a “in place” operation, which is the source of the “inplace” name (the server layer has no temporary table and the engine layer has temporary files, so it cannot be rebuilt when there is not enough disk space)
Corresponding to inplace is the method of copying tables. The usage is:
alter table t engine=innodb,ALGORITHM=copy;
When you use ALGORITHM = copy, it means that the table is forced to be copied (there are temporary tables and temporary files)

Supplement: use optimizetable, analyze table and alter tableThree ways to rebuild tablesDifferences between:

  • Starting from MySQL version 5.6, alter table t engine = InnoDB (i.e. recreate) defaults to the process shown in the figure above;
  • Analyze table t is not to rebuild the table, but to re count the index information of the table without modifying the data. In this process, an MDL read lock is added;
  • Optimize table t equals recreate + analyze.

14. Count (*) so slow, what should I do?

In different MySQL engines, count (*) has different implementation methods.

  • MyISAM engine stores the total row number of a table on disk, so it will directly return this number when executing count (*),
    High efficiency;
  • The InnoDB engine is in trouble. When it executes count (*), it needs to read the data from the engine line by line, and then accumulate the count.
    It should be noted that the count (*) without filter condition is discussed here. If the where condition is added, the MyISAM table cannot return so fast.
Why doesn’t InnoDB save numbers like MyISAM?

Because even for multiple queries at the same time, the “how many rows should be returned” of InnoDB table is uncertain due to multi version concurrency control (mvcc). (the number of rows found is inconsistent due to the consistency view when the transaction is started, the current read caused by the update operation of the transaction, etc.)

An ordinary index tree is much smaller than a primary key index tree. For an operation such as count (*), which index tree should be traversed to get
The results are logically the same. Therefore, MySQL optimizer will findThe smallest tree to traverse。 It is one of the general rules of database system design to minimize the amount of scanned data on the premise of ensuring correct logic.

There is also a table in the output result of the show table status command_ Rows is used to display the current number of rows in this table. The value of index statistics is estimated through sampling, so it cannot replace count (*)

MyISAM table count()Fast, but does not support transactions;
Although the show table status command returns quickly, it is not accurate;
InnoDB table direct count(
)It will traverse the whole table. Although the result is accurate, it will lead to performance problems

Q1. Now there is a page that often displays the total number of operation records of the trading system. What should I do?
The answer is:You can only count yourself
You need to find a place to save the number of rows in the operation record table
Come on.
1. Use the cache system to save the count: for example, redis, but the system is not synchronized, resulting in inaccuracy and loss of crash
2. Save the count in the database: directly put it into a separate count table C in the database. Due to the transaction characteristics of InnoDB, the above problems in redis can be solved (logically, start a transaction and execute two statements: insert into data table; update count table, count value plus 1).

Q2 what are the differences in the performance of different uses such as count (*), count (primary key ID), count (field) and count (1)?
When analyzing performance differences, you can keep these principles in mind:

  1. Give whatever the server layer wants;
  2. InnoDB only gives necessary values;
  3. The current optimizer only optimizes the semantics of count (*) as “fetch rows”, and other “obvious” optimizations are not done.
    aboutCount (primary key ID)For example, the InnoDB engine will traverse the entire table andThe ID value of each row is taken out, return to the server layer. After the server layer gets the ID, it judges that it cannot be empty and accumulates by line.
    aboutcount(1)For example, the InnoDB engine traverses the entire table, butNo value。 For each returned line, the server layer puts a number “1”. It is judged that it can not be empty, and it is accumulated according to the line.
    Just looking at the difference between the two usages, you can see that count (1) performs faster than count (primary key ID). Because returning ID from the engine involves parsing data rows and copying field values.
    aboutCount (field)For example:
  4. If the “field” is defined as not null, read the field from the record line by line. Judge that it cannot be null and accumulate by line;
  5. If the “field” definition allows nulls, when executing, judge that it may be null and take out the valueJudge again, it is not null to accumulate
    That is the first principle above. InnoDB returns the fields that the server layer wants.
    But count()It is an exception. Instead of taking out all the fields, it is specially optimized,No value。 count()It must not be null, accumulate by line.
    Therefore, the conclusion is: if sorted according to efficiency, count (field) < count (primary key ID) < count (1) ≈ count (*), so it is recommended to use count (*) as much as possible

16. How does order by work

Full field sorting vs ROWID sorting

1. MySQL will allocate a sort_buffer for each thread for sorting. The memory size is sort_ buffer_ size
1> If the amount of data sorted is less than sort_ buffer_ Size, the sorting will be completed in memory
2> If the amount of sorting data is large and so much data cannot be stored in memory, temporary disk files will be used to assist sorting, also known asExternal sorting
3> When using external sorting, MySQL will divide into several separate temporary files to store the sorted data, and then merge these files into a large file

2. MySQL will read the qualified data to sort by traversing the index_ Buffer and sort according to the sorting fieldQuick sort
1> If the queried fields are not included in the secondary index, you need to return the clustered index according to the primary key of the secondary index record to get the required fields
2> This method will cause random IO in MySQL 5 6 provides MRR mechanism, which will take out the primary key of auxiliary index matching records, sort them in memory, and then return them to the table
3> Establish a joint index according to the situation to avoid the performance loss caused by sorting. If allowed, you can also establish an overlay index to avoid returning to the table

Full field sorting
1. Read all the required fields to sort through the index_ In buffer
2. Sort by sort field
3. Return the result set to the client
Disadvantages:
1. Cause sort_ There is not much data in the buffer, because there are other fields besides sorting fields. Sort_ The utilization efficiency of buffer is not high
2. When a large amount of sorting data is required, there will be a lot of temporary files and the sorting performance will be very poor
advantage:
MySQL thinks that when the memory is large enough, it will give priority to full field sorting, because this method avoids a back to table operation than ROWID sorting

ROWID sort
1. Make sort by controlling the length of sorted row data_ Store as much data as possible in the buffer, max_ length_ for_ sort_ data
2. OnlySorted fields and primary keysRead sort_ Buffer and sort by sort field
3. According to the order after sorting,Get the ID to return to the tableTake out the data you want to get
4. Return the result set to the client
Advantages: better use of memory sort_ The buffer performs sorting operations to minimize access to the disk
Disadvantages: the operation of returning to the table is random IO, which will cause a large number of random reads. It does not necessarily reduce the access to the disk than full field sorting