MySQL indexing and SQL tuning summary

Time:2021-9-15

MySQL index

MySQL supports many storage engines, and various storage engines support indexes differently. Therefore, MySQL database supports a variety of index types, such as BTREE index, hash index, full-text index and so on. In order to avoid confusion, this article will only focus on the BTREE index, which is the index we mainly deal with when using mysql.

MySQL’s official definition of index is: index is a data structure that helps MySQL obtain data efficiently. Extract the sentence trunk to get the essence of the index:An index is a data structure.

MySQL indexing principle

Index purpose

The purpose of indexing is to improve query efficiency. It can be compared with a dictionary. If we want to look up the word “MySQL”, we must locate the letter M, then find the letter Y from bottom to bottom, and then find the remaining SQL. If there is no index, you may need to read all the words to find what you want. What if I want to find the word beginning with m? Or the word that begins with Ze? Do you think this can’t be done without an index?

Let’s borrow books from the library. If you want to borrow a book, you must first find the corresponding ledger and then the corresponding number. This is a living example in life. The general index can speed up the query speed and locate quickly.

Indexing principle

All indexing principles are the same. We filter out the final desired results by constantly narrowing the range of data we want to obtain, and turn random events into sequential events, that is, we always lock the data through the same search method.

The database is the same, but it is obviously much more complex, because it is faced with not only equivalent query, but also range query (>, <, between), fuzzy query (like), union query (or), multi value matching (in [in essentially belongs to multiple or]), and so on. How should the database deal with all the problems?

Let’s recall the example of a dictionary. Can we divide the data into segments and then query them in segments? In the simplest way, if 1000 pieces of data are divided into the first segment from 1 to 100, the second segment from 101 to 200, and the third segment from 201 to 300… In this way, if you look up the 250 pieces of data, you can just find the third segment, and 90% of the invalid data is removed at once. But if it’s a record of 10 million, how many paragraphs are better?

Students with a little basic algorithm will think of the search tree. Its average complexity is LGN and has good query performance. But here we ignore a key problem. The complexity model is based on the same operation cost every time. The database implementation is relatively complex, and the data is saved on the disk. In order to improve the performance, we can read part of the data into the memory for calculation every time, because we know that the cost of accessing the disk is about 100000 times that of accessing the memory, Therefore, a simple search tree is difficult to meet complex application scenarios.

index structure

Any data structure is not created out of thin air. It must have its background and usage scenarios. Let’s summarize what we need this data structure to do. In fact, it is very simple, that is, control the disk IO times in a small order of magnitude, preferably a constant order of magnitude, every time we look for data. So we think if a highly controllable multi-channel search tree can meet the demand? In this way, B + tree came into being.

Explanation of index structure of B + tree

MySQL indexing and SQL tuning summary

A light blue block is called a disk block. You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35, including pointers P1, P2 and P3. P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists in leaf nodes, i.e. 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90 and 99. Non leaf nodes do not store real data, but only store data items that guide the search direction. For example, 17 and 35 do not really exist in the data table.

Search process of B + tree

As shown in the figure, if you want to find the data item 29, first load the disk block 1 from the disk into the memory. At this time, an IO occurs. Use the binary search in the memory to determine that the 29 is between 17 and 35, lock the P2 pointer of disk block 1, and the memory time is negligible because it is very short (compared with the IO of the disk), Load disk block 3 from disk to memory through the disk address of P2 pointer of disk block 1, and the second IO occurs. 29 is between 26 and 30. Lock the P2 pointer of disk block 3, load disk block 8 to memory through the pointer, and the third IO occurs. At the same time, do a binary search in memory to find 29, and end the query. There are three IOS in total.

The real situation is that the three-tier B + tree can represent millions of data. If millions of data searches only need three IO, the performance improvement will be huge. If there is no index and each data item needs one IO, a total of millions of IO are required. Obviously, the cost is very high.

B + tree properties

1. Through the above analysis, we know that the smaller the interval, the more data items, and the lower the height of the tree. This is why each data item, that is, the index field, should be as small as possible. For example, int occupies 4 bytes, which is half less than bigint 8 bytes. This is why the B + tree requires that the real data be placed in the leaf node instead of the inner node. Once placed in the inner node, the data items of the disk block will decrease significantly, resulting in the increase of the tree. When the data item is equal to 1, it will degenerate into a linear table.

2. When the data item of the B + tree is a composite data structure, such as (name, age, sex), the B + number establishes the search tree from left to right. For example, when data such as (Zhang San, 20, f) is retrieved, the B + tree will give priority to comparing the name to determine the search direction in the next step. If the name is the same, then compare age and sex in turn to get the retrieved data; However, when there is no name data like (20, f), the B + tree does not know which node to query next, because name is the first comparison factor when establishing the search tree. You must search according to name first to know where to query next.

For example, when retrieving data such as (Zhang San, f), the B + tree can use name to specify the search direction, but the next field age is missing, so we can only find the data whose name is equal to Zhang San, and then match the data whose gender is F. this is a very important property, that is, the leftmost matching feature of the index.

MySQL index implementation

In mysql, index belongs to the concept of storage engine level. Different storage engines implement index differently. This paper mainly discusses the index implementation of MyISAM and InnoDB.

MyISAM index implementation

MyISAM engine uses B + tree as the index structure, and the data field of leaf node stores the address of data record.

The following figure is the schematic diagram of MyISAM index:

MySQL indexing and SQL tuning summary

Here, let’s assume that the table has three columns. Assuming that col1 is the primary key, the above figure is a schematic diagram of the primary key of a MyISAM table. It can be seen that the index file of MyISAM only stores the address of the data record. In MyISAM, there is no difference in the structure between the primary index and the secondary key, except that the key of the primary index is required to be unique, while the key of the secondary index can be repeated. If we create an auxiliary index on col2, the structure of the index is shown in the following figure:

MySQL indexing and SQL tuning summary

It is also a B + tree, where the data field stores the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to first search the index according to the B + tree search algorithm. If the specified key exists, take out the value of its data field, and then take the value of the data field as the address to read the corresponding data record.

The index method of MyISAM is also called “non clustered”, which is called to distinguish it from the clustered index of InnoDB.

InnoDB index implementation

Although InnoDB also uses B + tree as the index structure, the specific implementation is quite different from MyISAM.

The first major difference is that the InnoDB data file itself is an index file. As we know from the above, the MyISAM index file and the data file are separated, and the index file only saves the address of the data record. In InnoDB, the table data file itself is an index structure organized by B + tree. The leaf node data field of the tree saves complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

MySQL indexing and SQL tuning summary

The figure above is a schematic diagram of the InnoDB primary index (also a data file). You can see that the leaf node contains complete data records. This index is called a clustered index. Because the InnoDB data file itself needs to be aggregated by primary key, InnoDB requires that the table must have a primary key (MyISAM can not). If it is not explicitly specified, MySQL system will automatically select a column that can uniquely identify the data record as the primary key. If there is no such column, MySQL will automatically generate an implicit field for the InnoDB table as the primary key, The length of this field is 6 bytes and the type is long integer.

The second difference from MyISAM index is that InnoDB’s auxiliary index data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data field. For example, the following figure shows a secondary index defined on col3:

MySQL indexing and SQL tuning summary

Here, the ASCII code of English characters is used as the comparison criterion. The implementation of clustered index makes the search by primary key very efficient, but the secondary index search needs to retrieve the index twice: first retrieve the secondary index to obtain the primary key, and then use the primary key to retrieve the records in the primary index.

Understanding the index implementation methods of different storage engines is very helpful for the correct use and optimization of indexes. For example, after knowing the index implementation of InnoDB, it is easy to understand why it is not recommended to use too long fields as the primary key, because all secondary indexes refer to the primary index, and too long primary index will make the secondary index too large. For another example, using a non monotonic field as the primary key is not a good idea in InnoDB, because the InnoDB data file itself is a B + tree. A non monotonic primary key will cause frequent splitting and adjustment of the data file in order to maintain the characteristics of B + tree when inserting new records, which is very inefficient, and using a self increasing field as the primary key is a good choice.

How to build an appropriate index

Principle of indexing

One of the most important principles is the leftmost prefix principle. Before mentioning this, we should first talk about the joint index. The index in MySQL can reference multiple columns in a certain order. This index is called the joint index. Generally, a joint index is an ordered tuple, in which each element is a column of the data table. In addition, a single column index can be regarded as a special case where the number of joint index elements is 1.

The leftmost principle of index matching specifically means that if the index columns are a, B and C respectively, the order is also a, B and C:

  • When querying, if you query [a] [a, b] [a, B, C], you can query by index

  • If [a, C] is used in the query, although C is an index, B is missing in the middle, so the C index cannot be used, and only the a index can be used

  • If [b] [b, C] [C] is used for query, since the first column index is not used and is not the leftmost prefix, the subsequent indexes are also unavailable

  • If the range query is used in the query, and it is the leftmost prefix, that is, the index of the first column, the index can be used, but the column behind the range cannot be used

Although the index speeds up the query speed, the index also has a price: the index file itself consumes storage space, and the index will increase the burden of inserting, deleting and modifying records. In addition, MySQL also consumes resources to maintain the index when running. Therefore, the more indexes, the better

When using the InnoDB storage engine, if there is no special need, always use a business independent self increment field as the primary key. From the perspective of database index optimization, using InnoDB engine instead of self incrementing primary key is definitely a bad idea.

InnoDB uses a clustered index, and the data record itself is stored on the leaf node of the primary index (a B + tree). This requires that each data record in the same leaf node (the size is a memory page or disk page) be stored in the order of the primary key. Therefore, whenever a new record is inserted, MySQL will insert it into the appropriate node and location according to its primary key. If the page reaches the loading factor (InnoDB defaults to 15 / 16), a new page (node) will be opened. If the table uses a self incrementing primary key, each time a new record is inserted, the record will be added to the subsequent position of the current index node in order. When a page is full, a new page will be opened automatically. As follows:

MySQL indexing and SQL tuning summary

In this way, a compact index structure will be formed, which is filled in approximate order. Since there is no need to move the existing data during each insertion, it is efficient and does not increase a lot of overhead in maintaining the index.

If you use a non self increasing primary key (ID number or student number, etc.), each time the insertion of the primary key is almost random, every new record is inserted into the middle of the existing index page.

MySQL indexing and SQL tuning summary

At this time, MySQL has to move the data in order to insert the new record into the appropriate position. Even the target page may have been written back to the disk and cleared from the cache. At this time, it has to read back from the disk, which increases a lot of overhead. At the same time, frequent movement and paging operations cause a lot of fragments and get an uncomplicated index structure, Subsequently, you have to use optimize table to rebuild the table and optimize the filling page.

Therefore, as long as possible, please try to use the self increasing field on InnoDB as the primary key.

Common skills of indexing

1. The leftmost prefix matching principle is a very important principle. MySQL will match to the right until range queries (>, <, between and like) are encountered. For example, a = 1 and B = 2 and C > 3 and d = 4. If the index of (a, B, C, d) order is established, D cannot be indexed. If the index of (a, B, D, c) order can be used. The order of a, B, D can be adjusted arbitrarily.

2. = and in can be out of order. For example, a = 1 and B = 2 and C = 3 can establish (a, B, c) indexes in any order. The MySQL query optimizer will help you optimize them into a form that can be recognized by the index

3. Try to select the column with high discrimination as the index. The formula of discrimination is count (distinct Col) / count (*), which indicates the proportion of fields that are not repeated. The larger the proportion, the fewer records we scan. The discrimination of the only key is 1, and the discrimination of some status and gender fields may be 0 in front of big data. Someone may ask, what is the empirical value of this proportion? It is also difficult to determine this value in different usage scenarios. Generally, the fields that need to be joined are required to be more than 0.1, that is, an average of 1 scan 10 records

4. The index column cannot participate in the calculation. Keep the column “clean”, such as from_ Unixtime (create_time) = ‘2014-05-29’ can’t use the index. The reason is very simple. All the field values in the data table are stored in the B + tree, but when retrieving, you need to apply functions to all elements to compare. Obviously, the cost is too high. So the statement should be written as create_ time = unix_ timestamp(’2014-05-29’);

5. Expand the index as much as possible, and do not create a new index. For example, if there is already an index of a in the table and you want to add an index of (a, b), you only need to modify the original index. Of course, you should consider the original data and online usage


MySQL optimization

Configuration optimization

Configuration optimization refers to the configuration of the server side of MySQL. Generally, for the business side, you don’t need to pay attention to it. After all, there will be a special DBA to deal with it. However, I think we need to understand the principle of development.

MySQL optimization, you can also refer to: Super comprehensive MySQL optimization interview analysis

Basic configuration

innodb_buffer_pool_size

This is the first option that should be set after installing InnoDB. Buffer pool is where data and indexes are cached: the larger the value, the better, which ensures that you use memory instead of hard disk for most read operations. Typical values are 5-6gb (8GB memory), 20-25gb (32GB memory), and 100-120gb (128GB memory).

innodb_log_file_size

This is the size of the redo log. Redo logs are used to ensure that writes are fast and reliable and recover in the event of a crash. Up to MySQL 5.1, it is difficult to adjust, because on the one hand, you want to make it larger to improve performance, on the other hand, you want to make it smaller to recover faster after a crash.

Fortunately, the crash recovery performance has been greatly improved since MySQL 5.5, so you can have high write performance and crash recovery performance at the same time. Until MySQL 5.5, the total size of redo logs was limited to 4GB (there can be two log files by default). This has been improved in MySQL 5.6. If you know that your application needs to write data frequently and you use MySQL 5.6, you can turn it into 4G from the beginning.

max_connections

If you often see the ‘too many connections’ error, it is because max_ The value of connections is too low. This is very common because the application does not close the database connection correctly. You need a larger value than the default number of 151 connections.

max_ A major drawback after the connection value is set high (for example, 1000 or higher) is that the server becomes unresponsive when running 1000 or higher active transactions. Using connection pool in application or process pool in MySQL can help solve this problem.

InnoDB configuration

innodb_file_per_table

This setting tells InnoDB whether to store the data and indexes of all tables in a shared tablespace (innodb_file_per_table = off) or put the data of each table in a separate. IBD file (innodb_file_per_table = on). One file per table allows you to reclaim disk space when you drop, truncate, or rebuild tables.

This is also necessary for some advanced features, such as data compression. However, it will not bring any performance benefits. The main scenario where you don’t want one file per table is: there are many tables (such as 10K +). In MySQL 5.6, the default value of this attribute is on, so you don’t need to do anything in most cases. For previous versions, you must set this property to on before loading data, because it only affects newly created tables.

innodb_flush_log_at_trx_commit

The default value is 1, which means that InnoDB fully supports the acid feature. This value is most appropriate when your main concern is data security, such as on a master node. However, for systems with slow disk (read-write) speed, it will bring huge overhead, because each time the flush log is changed to the redo log, additional fsyncs are required.

Setting its value to 2 will lead to unreliability, because the committed transactions only flush to the redo log once per second, but it is acceptable for some scenarios, such as the backup node of the primary node. If the value is 0, the speed is faster, but some data may be lost in case of system crash: only applicable to backup nodes.

innodb_flush_method

This configuration determines how data and logs are written to the hard disk. Generally speaking, if you have a hardware RAID controller, and its independent cache adopts write back mechanism and battery power-off protection, it should be set to o_ DIRECT; Otherwise, it should be set to fdatasync (the default) in most cases. Sysbench is a good tool to help you decide this option.

innodb_log_buffer_size

This configuration determines the cache allocated for transactions that have not yet been executed. The default value (1MB) is generally enough, but if your transaction contains binary large objects or large text fields, this cache will soon fill up and trigger additional I / O operations. Look at InnoDB_ log_ Wait state variable. If it is not 0, increase InnoDB_ log_ buffer_ size。

Other settings

query_cache_size

Query cache is a well-known bottleneck, even when there is not much concurrency. The best option is to disable it from the beginning and set query_ cache_ Size = 0 (now the default value of MySQL 5.6) and use other methods to speed up query: optimize index, increase copy dispersion load, or enable additional cache (such as Memcache or redis).

If you have enabled query cache for your application and have not found any problems, query cache may be useful to you. This is if you want to stop it, you have to be careful.

log_bin

If you want the database server to act as the backup node of the primary node, it is necessary to turn on the binary log. If you do, don’t forget to set up the server_ ID is a unique value. Even if there is only one server, if you want to do point in time data recovery, this (turn on binary log) is also very useful: recover from your recent backup (full backup) and apply the modifications in binary log (incremental backup).

Once the binary log is created, it will be saved permanently. So if you don’t want to run out of disk space, you can use purge binary logs to clear old files, or set expire_ logs_ Days to specify how many days the log will be automatically cleared. Recording binary logs is not overhead, so it is recommended to turn off this option if you do not need it on a replication node that is not the primary node.

skip_name_resolve

When the client connects to the database server, the server will resolve the host name, and when DNS is very slow, it will be very slow to establish a connection. Therefore, it is recommended to turn off skip when starting the server_ name_ Resolve option without DNS lookup. The only limitation is that you can only use IP addresses in grant statements, so you must be extra careful when adding this setting to an existing system.

SQL tuning

Generally, SQL tuning is required. For SQL with slow queries, the system or server can start the slow query log, especially for online systems. If there are slow queries, they can be filtered through the log. However, after knowing that there is SQL to be optimized, the next thing to do is how to tune it

Basic steps of slow query optimization

  1. Run it first to see if it is really slow. Pay attention to setting SQL_ NO_ CACHE

  2. Query the where condition sheet table to lock the minimum return record table. The meaning of this sentence is to apply the where of the query statement to the table with the smallest number of records returned in the table. Start to query each field of a single table to see which field has the highest discrimination

  3. Explain to check whether the execution plan is consistent with 1 expectations (query from the table with few locked records)

  4. The SQL statement in the form of order by limit gives priority to the sorted table

  5. Understand the usage scenarios of the business party

  6. Several principles of building indexes when adding indexes

  7. The observation results do not meet the expectations. Continue to analyze from 0

Common tuning methods

Execution plan explain

In our daily work, we sometimes slow down the query to record some SQL statements that have been executed for a long time. Finding out these SQL statements does not mean that we are finished. We often use the explain command to check the execution plan of these SQL statements, and check whether the SQL statement has used the index and whether the full table has been scanned, This can be viewed through the explain command.

Therefore, we can deeply understand the cost based optimizer of MySQL, and get many details of access policies that may be considered by the optimizer, as well as which policies are expected to be adopted by the optimizer when running SQL statements.

To use explain, you only need to add the explain keyword to the original select, as follows:

mysql> explain select * from servers;+----+-------------+---------+------+---------------+------+---------+------+------+-------+| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra |+----+-------------+---------+------+---------------+------+---------+------+------+-------+|  1 | SIMPLE      | servers | ALL  | NULL          | NULL | NULL    | NULL |    1 | NULL  |+----+-------------+---------+------+---------------+------+---------+------+------+-------+1 row in set (0.03 sec)

Briefly explain the meaning of the explain fields

  • ID: indicates the order of SQL execution. SQL execution is from large to small

  • select_ Type: indicates the type of each select clause in the query

  • Table: displays which table the data in this row is about, sometimes not the real table name

  • Type: indicates how MySQL finds the required row in the table, also known as “access type”. Common types are: all, index, range, ref, Eq_ Ref, const, system, null (from left to right, poor to good performance)

  • possible_ Keys: indicates which index MySQL can use to find records in the table. If there is an index on the fields involved in the query, the index will be listed, but it may not be used by the query

  • Key: the key column displays the key (index) that MySQL actually decides to use. If no index is selected, the key is null.

  • key_ Len: indicates the number of bytes used in the index. The length of the index used in the query can be calculated through this column (the value displayed by key_len is the maximum possible length of the index field, not the actual length used, that is, key_len is calculated according to the table definition, not retrieved from the table)

  • Ref: indicates the join matching criteria of the above table, that is, which columns or constants are used to find the values on the index columns

  • Rows: indicates the estimated number of rows MySQL needs to read to find the required records according to table statistics and index selection. The less the theoretical uplink number, the better the query performance

  • Extra: this column contains the details of MySQL solving the query

Explain properties

  • Explain will not tell you about triggers, stored procedures, or the impact of user-defined functions on queries

  • Explain does not consider various caches

  • Explain cannot display the optimization work done by MySQL when executing the query

  • Some statistics are estimated, not exact

  • Expalin can only interpret the select operation. Other operations should be rewritten to select and then view the execution plan.


Actual combat drill

Table structure and query statement

If there is the following table structure

circlemessage_ idx_ 0  |  CREATE   TABLE  ` circlemessage_ idx_ 0`   (   ` circle_id`   bigint(20)   unsigned   NOT   NULL   COMMENT  ' Group ID ',   ` from_id`   bigint(20)   unsigned   NOT   NULL   COMMENT  ' Send user ID ',   ` to_id`   bigint(20)   unsigned   NOT   NULL   COMMENT  ' Specify receiving user ID ',   ` msg_id`   bigint(20)   unsigned   NOT   NULL   COMMENT  ' Message ID ',   ` type`   tinyint(3)   unsigned   NOT   NULL   DEFAULT  ' 0'   COMMENT  ' Message type ',    PRIMARY   KEY   (`msg_id`,`to_id`),    KEY  ` idx_from_circle`   (`from_id`,`circle_id`))   ENGINE=InnoDB   DEFAULT   CHARSET=utf8mb4   COLLATE=utf8mb4_ bin

Analyze the following query statements by executing the plan explain

mysql> explain select msg_id from circlemessage_idx_0 where  to_id = 113487 and circle_id=10019063  and msg_id>=6273803462253938690  and from_id != 113487 order by msg_id asc limit 30;+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+| id | select_type | table               | type  | possible_keys           | key     | key_len | ref  | rows   | Extra       |+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+|  1 | SIMPLE      | circlemessage_idx_0 | range | PRIMARY,idx_from_circle | PRIMARY | 16      | NULL | 349780 | Using where |+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+1 row in set (0.00 sec)

\

mysql> explain select msg_id from circlemessage_idx_0 where  to_id = 113487 and circle_id=10019063   and from_id != 113487 order by msg_id asc limit 30;+----+-------------+---------------------+-------+-----------------+---------+---------+------+------+-------------+| id | select_type | table               | type  | possible_keys   | key     | key_len | ref  | rows | Extra       |+----+-------------+---------------------+-------+-----------------+---------+---------+------+------+-------------+|  1 | SIMPLE      | circlemessage_idx_0 | index | idx_from_circle | PRIMARY | 16      | NULL |   30 | Using where |+----+-------------+---------------------+-------+-----------------+---------+---------+------+------+-------------+1 row in set (0.00 sec)

problem analysis

Through the above two execution plans, it can be found that when there is nomsg_ ID > = XXXWhen there are two query criteria, the rows retrieved are much less, and the indexes are used in both queries, and only the primary key index is used. That shows that the index should be unreasonable and not play its maximum role.

Analyzing this execution plan, you can see that when it containsmsg_id >= xxx  When querying criteria, there are 34W more rows in rows. This indicates that there are too many searches. Either there is such a large table, or the index is unreasonable and the index is not used. In most cases, the index is not used reasonably. The index used in the column is also primary, which may be one of (msg_id, to_id). Pay attention to MSG when we create the table_ The order of ID index is to_ ID, so MySQL queries will give priority to MSG_ ID index, MSG used in_ After the ID index, 34W rows have been retrieved, and due to MSG_ The query condition of ID is greater than or equal to, so you can’t use to after this query condition_ The index of the ID.

Then look at key_ If the len length is 16 and the combined key is primary, it can be analyzed that only one primary key index is used.

Finally, if the type value is range, it means that the query is either range query or multi value matching.

Please note that,from_id != xxxSuch statements cannot use indexes. onlyfrom_id = xxxTherefore, the index of from ID can not be used, so it should be considered clearly when creating the index

How to optimize

Since you know that the index is unreasonable, you should analyze and adjust the index. Generally speaking, since we want to query from a single table, we need to know the general data in a single table and the current magnitude.

Then start the next analysis. Since msgid is set as the primary key, it must be globally unique. There will be at least as many msgids as the amount of data; Then retrieve MSG_ ID is basically to retrieve the entire table. The optimization we need to do is to minimize the index and the number of query rows; So you need to think about which fields can reduce the number of rows by querying? For example, will the number of rows of a user in a table be less than that of msgid? Would it be less to query a user who belongs to a certain circle? wait.

Then, according to the analysis of the actual situation, hit to in a single table_ The number of rows of ID should be less than the hit MSG_ ID, so first ensure that to can be used_ ID index. For this purpose, you can set MSG when setting the primary key_ ID and to_ Interact with each other in the order of ID; However, because it is already an online table, there is already a large amount of data, and the business is running. In this case, modifying the primary key will cause many problems (of course, modifying the index is OK). Therefore, it is not recommended to modify the primary key directly.

So, in order to ensure the effective use of to_ ID index, you need to create a new union index; Then the first index field of the new union index must be to_ ID. for this business scenario, it is better to add circle_ ID index, which can quickly index; In this way, you get the index of the new joint index (to_id, circle_id). Then, because you want to find MSG_ ID, therefore, on this basis, add MSG_ id。 The final joint index is (to_id, circle_id, msg_id); In this way, you can quickly retrieve such query statements:where to_id = xxx and circle_id = xxx and msgId >= xxx

Of course, the establishment of an index does not mean that a joint index is established for any index required by an SQL statement. In this case, if there are too many indexes, the write performance will be affected (insert, delete and modify), and then the storage space will increase accordingly; In addition, MySQL will also consume resources to maintain indexes at runtime. Therefore, the more indexes, the better. It is necessary to establish an appropriate index in combination with the SQL that queries the most frequently and affects the performance most. It should be noted that a joint index or a group of primary keys is a BTREE, and multiple indexes are multiple Btrees


summary

First of all, we need to deeply understand the principle and implementation of index. When we understand the principle, it can be more helpful for us to establish an appropriate index. Then, when we create an index, we should not take it for granted. We should first think about the business logic, and then establish the corresponding table structure and index. The following points need to be emphasized again:

  • The more indexes, the better

  • Distinguish between primary key and index

  • Understand the principle of index structure

  • Understanding query index rules

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Recommended Today

Beautify your code VB (VBS) code formatting implementation code

However, vb.net does have many new functions that VB6 does not have. The automatic typesetting of code is one, which is the function we want to realize today – VB code formatting.Let’s look at the effect: Before formatting: Copy codeThe code is as follows: For i = 0 To WebBrowser1.Document.All.length – 1 If WebBrowser1.Document.All(i).tagName = […]