MySQL knowledge

Time:2021-2-25

What is the difference between hash index and B + tree index?

Due to the particularity of hash index structure, the retrieval efficiency is very high, and it can be located at one time.
B + tree index needs to be accessed from root node to branch node and then to leaf node for many times.

Why not use hash index instead of B + tree index

1. Hash index can’t satisfy range query and can’t avoid sorting operation, because the size of hash value is not always the same as that before operation.
2. For composite index, hash index combines composite index and calculates hash value together when calculating index value, instead of calculating hash value separately. Therefore, hash index cannot be used when querying through one or several index keys.
3. When a large number of hash values are equal, the performance of hash index is not necessarily higher than that of B + tree
4. If it is an equivalent query, then hash index has an absolute advantage, because it only needs one algorithm to find the corresponding key value. This premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan back according to the linked list until you find the corresponding data.

B + tree index

The common InnoDB engine uses the B + tree index by default. It will monitor the use of the index in real time. If it thinks that building a hash index can improve the query efficiency, it will automatically build a hash index in the “adaptive hash index buffer” in memory (the adaptive hash index is turned on by default in InnoDB). By observing the query mode, MySQL will use the index Key prefix to establish a hash index. If most of a table is in the buffer, then establishing a hash index can speed up the equivalent query.

What is the difference between the normal index and primary key index of InnoDB engine?

difference:
Index is a data structure, which stores the physical address of the table data
The clustered index stores both the table data key and the row value. The logical order of the physical address is the same as that of the table! It’s the only one
Non clustered index: stores the physical address and key value of the table data. You can query the specific row value according to the physical address corresponding to the key value. However, there is no strong consistency between the order of the physical address and the logical order of the table!

Fundamental difference: is the order of table records consistent with that of indexes

Ordinary index: This is the most basic index type, and it has no uniqueness and other restrictions.
Primary key index: unique and cannot be empty
Primary key index is also called cluster index, leaf node stores the whole row of data; non primary key index is called secondary index, leaf node stores the value of primary key
If you query according to the primary key, you only need to search the B + tree ID
If you query through non primary key index, you need to search the K index tree first, find the corresponding primary key, and then search the ID index tree again. This process is called back table
In conclusion, the query of non primary key index needs to scan more than one index tree, and the efficiency is relatively low

Mysql database has several configuration options to help us capture inefficient SQL statements in time

1,slow_query_log
This parameter is set to on to capture SQL statements whose execution time exceeds a certain value.
2,long_query_time
When the execution time of SQL statement exceeds this value, it will be recorded in the log. It is recommended to set it to 1 or less.
3,slow_query_log_file
The name of the log file.

Explain statement

type
Table access means that MySQL finds the required row in the table, also known as “access type”.
Common types are: all, index, range, ref, Eq_ Ref, const, system, null (from left to right, performance from poor to good)

All: full table scan, MySQL will traverse the whole table to find the matching row
Index: full index scan. Index is different from all in that it only traverses the index tree
Range: only the rows in the given range are retrieved, and one index is used to select the rows
Ref: indicates the join matching conditions of the above table, that is, which columns or constants are used to find the values on the index columns
eq_ Ref: similar to ref, the difference is that the index used is unique. For each index key value, only one record in the table matches. In short, primary key or unique key is used as the association condition in multi table join
Const, system: when MySQL optimizes a part of the query and converts it to a constant, these types are used to access. If the primary key is placed in the where list, MySQL can convert the query to a constant. System is a special case of const type. When the query table has only one row, system is used
Null: MySQL decomposes statements in the process of optimization, and does not even need to access tables or indexes when executing. For example, selecting the minimum value from an index column can be done through a separate index search.

Extra
This column contains the details of how MySQL solves the query. There are the following situations:
1. Using filesort: indicates that MySQL will apply an external index sort to the data. Instead of reading according to the index order in the table. MySQL can not use the index to complete the sort operation is called “file sort”
2. Using temporary: temporary table is used to save intermediate results, and MySQL uses temporary table when sorting query results. It is commonly used in sorting order by and grouping query group by.
3. Using index: indicates that the corresponding select operation uses the overlay index to avoid accessing the data rows of the table. If using where occurs at the same time, the index of table name is used to search the key value of index; if not, the index of table name is used to read data instead of query.
4. Using where: indicates where filtering is used
5. Using join buffer: connection cache used
6、impossible where:where Clause is always false and cannot be used to get any tuples
7. Select tables optimized away: in the case of no group by clause, min and Max operations are optimized based on index, or count (*) is optimized for MyISAM storage engine. Calculation does not have to wait until the execution phase. The optimization is completed at the generation phase of query execution plan.
8. Distinct: optimizes the distinct operation to stop finding the same value after finding the first matching tuple.

Key
Displays the key (index) that MySQL actually decides to use. If no index is selected, the key is null. If an overlay index is used in a query, it overlaps with the select field of the query.

possible_keys
Indicates which index MySQL can use to find rows in the table

Transaction characteristics

Atomicity, consistency, isolation and persistence

Atomicity

A transaction must be regarded as an indivisible minimum unit. All operations in the whole transaction are either submitted successfully or failed. For a transaction, it is impossible to execute only part of the operations

Consistency

Consistency means that a transaction transforms a database from one consistency state to another consistency state, and the integrity of the data in the database is not damaged before and after the transaction

Durability

Once a transaction is committed, its changes are permanently saved to the database. At this point, even if the system crashes, the submitted modification data will not be lost

Isolation

The execution of one transaction cannot be interfered by other transactions. That is, the operation and data used in a transaction are isolated from other concurrent transactions, and the concurrent transactions cannot interfere with each other.

Four isolation levels

Read uncommitted

At this isolation level, all transactions can see the execution results of other uncommitted transactions. This isolation level is rarely used in practical applications, because its performance is not much better than other levels. Reading uncommitted data is also called dirty read.

Read committed

This is the default isolation level for most database systems (but not MySQL). It satisfies the simple definition of isolation: a transaction can only see the changes made by the committed transaction. This isolation level also supports the so-called non repeatable read, because other instances of the same transaction may have new commit during the processing of this instance, so the same select may return different results.

Repeatable read

This is the default transaction isolation level of MySQL, which ensures that when multiple instances of the same transaction read data concurrently, they will see the same data row. In theory, however, this leads to another thorny problem: phantom read. In short, when a user reads a range of data rows, another transaction inserts a new row in the range. When a user reads a range of data rows again, a new "phantom" row will be found. InnoDB and Falcon storage engine solve this problem through mvcc (multi version concurrency control) mechanism.

Serializable (serializable)

This is the highest level of isolation. It solves the problem of unreal reading by forcing transactions to be sorted so that they cannot conflict with each other. In short, it adds a shared lock to each read row. At this level, it may cause a lot of timeout and lock contention.

     These four isolation levels are implemented by different lock types. If the same data is read, problems will easily occur. For example:

Dirty read: a transaction has updated a piece of data, and another transaction has read the same piece of data at this time. For some reasons, if the previous rollback has operated, the data read by the latter transaction will be incorrect. (if transaction a reads the data updated by transaction B, and then transaction B rolls back the operation, then the data read by transaction a is dirty data.)

Non repeatable read: the data in two queries of a transaction is inconsistent, which may be due to the insertion of the original data updated by a transaction in the process of two queries. (transaction a reads the same data many times, and transaction B updates and submits the data during the process of reading the same data many times, which leads to inconsistent results when transaction a reads the same data many times.)

Phantom read: in two queries of a transaction, the number of data is inconsistent. For example, one transaction queries several rows of data, while another transaction inserts new columns of data at this time. In the next query, the previous transaction will find several columns of data that it did not have before. (system administrator a changes the scores of all students in the database from specific scores to ABCDE level, but system administrator B inserts a record of specific scores at this time. When system administrator a finishes the change, he finds that there is another record that has not been changed, which is like hallucination. This is called hallucination.)

In mysql, the implementation of these four isolation levels may cause problems as follows:
MySQL knowledge

SQL statement optimization principle:

Principle 1: avoid operation on columns, select * from t where year (d) > =’2020 ‘is optimized as: select * from t where d > =’2020’
Principle 2: when using join, you should use a small result set to drive a large result set
Principle 3: avoid%% when using like fuzzy query
Principle 4: avoid select*
Principle 5: use batch insert statements to save interactive insert into t (ID, name) values (1,’a ‘), (2,’b’), (3,’c ‘)
Principle 6: when the limit cardinality is relatively large, use between (when the ID is broken, the number read will be less than the expected number). When comparing the following data, use desc to find the data direction, so as to reduce the scanning of the previous data and make the limit cardinality smaller, the better
Principle 7: do not use rand function to obtain multiple random data
Principle 8: avoid null
Principle 9: use count (*) instead of count (ID)
Principle 10: do not do unnecessary sorting operation, but try to complete the sorting in the index
Text type query will use temporary table, resulting in serious performance overhead, separate query

Speed up paging query by using table overlay index

As we all know, if only the index column (overlay index) is included in the query statement using the index, the query will be fast.
Because there is an optimization algorithm using index search, and the data is on the query index, so there is no need to find the relevant data address, which saves a lot of time.
In addition, MySQL also has related index caching, which is better when the concurrency is high.
In our example, we know that the ID field is the primary key, so we naturally include the default primary key index. Now let’s take a look at the query effect of using the overlay index
This time, we query the data of the last page (using the overlay index, which only contains the ID column), as follows:
select id from product limit 866613, 20
The query time is 0.2 seconds, which is about 100 times faster than that of 37.44 seconds.
So if we want to query all columns, there are two ways,
ID > = form:
SELECT * FROM product
WHERE ID > =(select id from product limit 866613, 1) limit 20

Anti injection of MySQL

1. If it is an integer variable or field, use the intval() function to convert all the incoming parameters into a value, such as turning pages and browsing articles by ID
2. For character type variables, using addslashes() will convert all single quotation marks (‘), double quotation marks (‘), backslashes () and empty characters into overflow characters with backslashes, or use PDO parameter binding to improve security
3. Escape or filter some special characters, such as%
4. Protect key information such as table structure (for open source programs, this is just a helpless move. For example, some people will deliberately make field names strange, which is not desirable. The most reliable thing is to keep a good check at the code level.)
5. Data backup should be done well in any case, just in case.

Summary of MySQL locks

1. Shared lock (also called read lock) and exclusive lock (also called write lock)

Lock mechanism of InnoDB engine:InnoDB supports transaction, and supports row lock and table lock. MyISAM does not support transaction, but only table lock.
Shared lock (s): an exclusive lock that allows a transaction to read a row and prevents other transactions from obtaining the same data set.
Exclusive lock (x): allows transactions that obtain exclusive locks to update data, and prevents other transactions from obtaining shared read locks and exclusive write locks of the same dataset.
Intentionally shared lock (is): a transaction intends to add a row shared lock to a data row. Before adding a row shared lock to a data row, the transaction must obtain the is lock of the table.
Intent exclusive lock (IX): a transaction intends to add an exclusive lock to a data row. Before adding an exclusive lock to a data row, the transaction must obtain the IX lock of the table.

explain:
1) Shared lock and exclusive lock are row lock, intent lock are table lock. In application, we only use shared lock and exclusive lock. Intent lock is used internally in MySQL and does not need user intervention.

2) For update, delete and insert statements, InnoDB will automatically add exclusive locks (x) to the data sets involved; for ordinary select statements, InnoDB will not add any locks, and transactions can be displayed to add shared or exclusive locks to the recordsets through the following statements.
Shared lock: select * from table_ name WHERE … LOCK IN SHARE MODE。
Exclusive lock (x): select * from table_ name WHERE … FOR UPDATE。

**For applications that need to update after locking row records, select should be used For update to obtain exclusive locks. (with shared lock, writing after reading will block and result in deadlock)

Let’s talk about MyISAM: MyISAM will automatically add read locks to all involved tables before executing the query statement (select), and will automatically add write locks to the involved tables before executing the update operation (update, delete, insert, etc.).

3) InnoDB row lock is realized by adding a lock to the index item on the index. Therefore, the implementation feature of InnoDB row lock means that InnoDB can only use row level lock when retrieving data through index conditions, otherwise, InnoDB will use table lock!

2. Optimistic lock and pessimistic lock

Pessimistic lock:Pessimistic lock, just like its name, refers to the conservative attitude towards the modification of data by the outside world (including other current transactions of the system, as well as transactions from the external system). Therefore, in the whole process of data processing, the data is locked. The implementation of pessimistic lock often depends on the lock mechanism provided by the database (only the lock mechanism provided by the database layer can truly guarantee the exclusiveness of data access, otherwise, even if the lock mechanism is implemented in this system, it cannot guarantee that the external system will not modify the data)

1) To use pessimistic lock, we must turn off the auto commit property of MySQL database and use the manual transaction commit mode, because MySQL uses the auto commit mode by default, that is, when you perform an update operation, MySQL will submit the result immediately.

2) Note that in a transaction, only select When for update or lock in share mode is the same data, it will wait for other transactions to finish before executing It is not affected by this. For update, delete, and insert statements, InnoDB will automatically add exclusive locks (x) to the data sets involved.

3) Add: MySQL select Row lock and table lock for update
Use Select For update will lock the data, but we need to pay attention to the level of some locks. MySQL InnoDB defaults to row level lock, so only if you explicitly specify the primary key (or the place with index), MySQL will execute row lock (only lock the selected data), otherwise MySQL will execute table lock (lock the whole data form).

Optimistic lock:
Compared with pessimistic lock, optimistic lock assumes that the data will not cause conflict in general, so when the data is submitted for update, it will formally detect whether the data conflict or not. If a conflict is found, it will return the user’s wrong information and let the user decide how to do it(Generally, a transaction is rolled back)。 So how can we achieve optimistic locking? Generally speaking, there are two ways:

1) . using the data version recording mechanism, which is the most commonly used implementation of optimistic lock. What is data version? That is to add a version identifier to the data, which is usually realized by adding a “version” field of numeric type to the database table. When reading data, the value of the version field is read out together. Every time the data is updated, the value of this version is increased by one. When we submit an update, we judge that the current version information of the corresponding record of the database table is compared with the version value obtained for the first time. If the current version number of the database table is equal to the version value obtained for the first time, it will be updated, otherwise it will be considered as expired data.

2) The second implementation of optimistic locking is similar to the first one. It also adds a field to the table that needs optimistic locking control. The name doesn’t matter. The field type uses timestamp, Similar to the above version, the timestamp of the data in the current database is checked when the update is submitted and compared with the timestamp obtained before the update. If it is consistent, it will be OK, otherwise it will be a version conflict.

Conclusion: two kinds of locks have their own advantages and disadvantages. We can’t think that one is better than the other. For example, optimistic lock is suitable for the case of less writing, that is, when conflicts really rarely occur. In this way, the lock overhead can be saved and the whole throughput of the system can be increased. However, if there are frequent conflicts, the upper application will continue to retry, which will reduce the performance, so pessimistic lock is more appropriate in this case.
In addition, in the case of high concurrency, I think optimistic lock is better than pessimistic lock, because the mechanism of pessimistic lock makes each thread wait too long, which greatly affects the efficiency. Optimistic lock can improve concurrency to a certain extent.

3. Table lock and row lock

Table level locking: MyISAM and memory storage engine
Row level locking: InnoDB storage engine
Page level locking: BDB storage engine
Table level lock:It has the advantages of small cost, fast locking, no deadlock, large locking granularity, the highest probability of lock conflict and the lowest concurrency.
Row level lock:It has the advantages of high cost, slow locking, deadlock, minimum locking granularity, the lowest probability of lock conflict and the highest concurrency.
Page lock:The cost and locking time are bounded between table lock and row lock; deadlock occurs; locking granularity is bounded between table lock and row lock, and the concurrency is general.

Master slave copy and read write separation of database**:**

Principle:

The basis of data replication between MySQL is binary log file. Once the binary log is enabled in a MySQL database, as the master, all operations in the database will be recorded in the binary log in the form of “event”. Other databases, as slaves, keep communication with the master server through an I / O thread, and monitor the changes of the master’s binary log file. If the master’s binary log file changes, it will change Copy the changes to your own relay log, and then a SQL thread of slave will execute the relevant “events” to your own database, so as to achieve the consistency between the slave database and the master database, which means master-slave replication.

Master server configuration

1. Open binary log
2. Configure a unique server ID
3. Get the name and location of the master binary log file show master status;
4. Create a user account for slave and master communication
realization:
1. Find the mapping of data volume mysqld.cnf File (for example, / var / lib / docker / volumes / MySQL)_ mysql-conf/_ data/ mysql.conf .d)
2. Modify the configuration
3. Connect to MySQL, create users and give them permission
For example, grant replication slave on. TO ‘wzh‘@’%’ IDENTIFIED BY ‘123456’)

From server configuration:

1. Configure a unique server ID
2. Use the user account assigned by the master to read the master binary log
3. Enable the slave service
realization:
Slave library change master to master_ HOST=’192.168.79.130’,MASTER_ USER=’wzh’,MASTER_ PASSWORD=’123456’,MASTER_ LOG_ FILE=’mysql-bin.000003’,MASTER_ LOG_ POS=1329;
start slave;
show slave status;
If both of them are yes, they will succeed

Precautions**:**

1. After the docker composition is copied and the configuration is modified, restart the docker composition restart
2. After copying the slave database, you need to modify the UUID, find – name of MySQL auto.cnf Find the location and modify it to make the master-slave inconsistent
3. By default, all libraries are copied by master and slave, and the specified read-write library can be configured
4. If the structure of the master-slave table is different, after modifying the master table, the slave SQL running will be changed to No. after stopping the slave, modify the table structure, re-establish the connection and start.
Therefore, in order to keep synchronization, a user with read-only permission can be added to the slave library. The slave library does not write read-only, and the read-write separation is realized.

MySQL sub table

Sub table is a good way to disperse the pressure of database.
Split table, the most straightforward meaning, is to divide a table structure into multiple tables, and then, you can put them in the same library or in different libraries.
Of course, the first thing to know is under what circumstances a sub table is needed. I feel that when the number of records in a single table reaches the level of one million to ten million, it is necessary to use the sub table.

1. Classification of sub tables

1> Vertical sub table
The content that could have been in the same table is artificially divided into multiple tables. (the so-called “original” means that according to the requirements of the third normal form of relational database, it should be in the same table.)
Reason for sub table: separate according to the activity of data (because different active data have different processing methods)
case
For a blog system, the article title, author, classification, creation time, etc., are data with slow change frequency, many queries, and good real-time performance. We call it cold data. We call it active data for blog views, replies, similar statistics, or other data with high frequency of change. Therefore, in the database structure design, we should consider the sub table, the first is the vertical sub table processing.

In this way, after the vertical sub table:
First of all, the use of storage engines is different. MyISAM can be used to query cold data better. Active data, you can use InnoDB, you can have better update speed.
Secondly, the cold data is configured more from the database, because more operation time queries are needed to speed up the query speed. For thermal data, there can be more horizontal table processing of main database.
In fact, for some special active data, you can also consider using Memcache and redis
Such as cache, and so accumulated to a certain amount to update the database. Or NoSQL databases such as mongodb. This is just an example. Let’s not talk about this.

2> Horizontal sub table
Literally, you can see that it is to cut a large table structure horizontally into different tables with the same structure, such as user information table and user information table_ 1,user_ 2, etc. The table structure is exactly the same, but the table is divided according to certain rules, such as module division according to user ID.
Reasons for sub table: divide according to the scale of data volume to ensure that the capacity of a single table will not be too large, so as to ensure the query processing capacity of a single table.
caseSame as the above example, blog system. When the number of blogs is large, we should adopt horizontal segmentation to reduce the pressure of each single table and improve the performance. For example, if the blog’s cold data table is divided into 100 tables, when there are 1 million users browsing at the same time, if it is a single table, it will make 1 million requests. Now, after the table is divided, it may be 10000 requests for each table (because it is impossible to be absolutely average, just assuming), so the pressure will be reduced a lot.

This work adoptsCC agreementReprint must indicate the author and the link of this article