Can you really use the index database?

Time:2022-5-5
Can you really use the index database?

The drawing is from freepik

Reprinted from:Can you really use the index database?

It’s also easy to use indexes. However, it’s one thing to be able to use indexes, and it’s another thing to have a deep understanding of the principles of indexes and be able to use indexes properly.

1、 Foreword

Whether in an interview or in daily work, you will more or less use or hear others talk about indexing.

However, most programmers only know about index until the concept of “adding index can make query faster”.

It’s also easy to use indexes. However, it’s one thing to be able to use indexes, and it’s another thing to have a deep understanding of the principles of indexes and be able to use indexes properly.

These are two very different levels of technology.

2、 Comparison of query efficiency between index and non index of tens of millions of data tables

Now there is a student liststudent, there are 10 million pieces of data

Can you really use the index database?

Unindexed, queryclass_id=2Time consuming for student information:SELECT \* FROM student WHERE class_id=2It took 3.357 seconds

Can you really use the index database?

Add index, queryclass_id=2Time consuming for student information:SELECT \* FROM student WHERE class_id=2It took 0.017 seconds

Can you really use the index database?

Under 10 million data, the performance of the two queries is nearly 200 times worse!!

This gap is particularly large! No wonder you need to add index!!!

3、 What is an index

Many online articles explaining the index describe the index as follows:

An index is like a book catalog, through which you can accurately locate the specific content of a book.

This sentence is very correct!

But it’s the same as not saying it. People who understand it naturally understand it! People who don’t understand feel that they understand, but they still have a covered face!

In fact, if you want to understand the principle of indexing, you must understand a data structure:

“Balanced tree” (non binary), that isB TreeperhapsB+Tree

Of course, some databases also use hash bucket as the index data structure. However, the mainstream RDBMS take the balance tree as the default index data structure of the data table.

We usually add a primary key to the table when creating a table. In some relational databases, if the primary key is not specified when creating a table, the database will refuse to execute the statement of creating a table.

In fact, a table with a primary key cannot be called a “table”. For a table without a primary key, its data is placed on the disk memory disorderly, and arranged neatly row by row.

If the primary key is attached to the table, the storage structure of the table on the disk will change from a neatly arranged structure to a tree structure, that is, the above-mentioned “balanced tree” structure. In other words, the whole table will become an index.

Yes, again, the whole table has become an index!

This is the so-called “clustered index”.This is why a table can only have a primary key and a table can only have a “clustered index”, because the function of the primary key is to convert the data format of the “table” into the format of “tree (index)”.

When the index is not added, the query SQL executed before will make the database system traverse the whole table row by row, and check the value of each rowclass_idWhether the field is equal to2。 Because we need to find allclass_idby2So when we found oneclass_idyes2The search cannot be stopped after the record of, because there may be other recordsclass_idbe equal to2Other records.

This means that for tens of millions of records in the table, the database should check each one. This is called “full table scan”

The biggest function of adding index is to speed up the query, which can fundamentally reduce the number of records / rows that need to be scanned.

4、 Index in MySQL

In mysql, indexes can be classified in two ways:Logical classificationandPhysical classification

According to logical classification, indexes can be divided into:

  • Primary key index: a table can only have one primary key index, which cannot be duplicate or null;

  • Unique index: data columns cannot be duplicate, and null values are allowed. A table can have multiple unique indexes, but a unique index can only contain one column, such as ID card number and card number;

  • Common index: a table can create multiple common indexes. A common index can contain multiple fields, allowing data duplication and null value insertion;

  • Full text index: an index that makes search keywords more efficient.

According to the physical classification, the index can be divided into:

  • Clustered index: it is generally the primary key index in the table. If the specified primary key is not displayed in the table, the first unique index in the table that is not allowed to be null will be selected. If it still does not exist, the 6-byte ROWID built in by InnoDB storage engine for each row of data will be used as the clustered index. Each table has only one clustered index, because the logical order of the key values of the clustered index determines the physical order of the corresponding rows in the table. Clustered index has good performance in accurate search and range search (compared with ordinary index and full table scan), so clustered index is very valuable. The selection of clustered index should be cautious (generally, self incrementing ID without semantics will not be used as clustered index);

  • Nonclustered index: the logical order of indexes in this index is different from the physical storage order on the disk (the column that is not the primary key). A table can have multiple nonclustered indexes.

In the InnoDB storage engine of MySQL, which is most used at present, the B + tree index method is used for index establishment.

B + tree index is an implementation of B + tree in database. It is the most common and frequently used index in database.

B in the B + tree represents balance, not binary, because the B + tree evolved from the earliest balanced binary tree. First understand the binary search tree, balanced binary tree (AVLTree) and balanced multiple search tree (B-tree). B + tree is gradually optimized from these trees.

For specific explanations, please refer to the following articles:MySQL indexing mechanism (B + tree)

5、 Advantages and disadvantages of index

advantage:

  • Index can improve the efficiency of data retrieval and reduce the IO cost of database.
  • By creating a unique index, you can ensure the uniqueness of each row of data in the database table and create a unique index
  • When using grouping and sorting clauses for data retrieval, it can also significantly reduce the time of grouping and sorting in the query
  • To speed up the connection between two tables, you usually create an index on a foreign key

Disadvantages:

  • Creating and maintaining indexes takes time, which increases with the amount of data
  • Indexes need to occupy physical space. In addition to the data space occupied by data tables, each index also needs to occupy a certain physical space. If cluster indexes are to be established, the space required will be larger
  • When the data in the table is added, deleted and modified, the index should also be maintained dynamically, which reduces the speed of data maintenance

6、 When should indexes be used

To create an index:

  • Primary key, automatically establishing unique index
  • Fields frequently used as query criteria
  • The fields associated with other tables in the query have foreign key relationships
  • For the sorted fields in the query, if the sorted fields are accessed through the index, the sorting speed will be greatly improved
  • Statistics or grouping fields in query

Avoid creating indexes:

  • Do not use indexes for fields with poor data uniqueness

    For example, there are only two possible data. It means that the binary tree level of the index is less and more horizontal. Such binary tree lookup is no different from full table scanning.

  • Do not use indexes for frequently updated fields

    For example, frequent changes in login times lead to frequent changes in indexes, increasing database workload and reducing efficiency.

  • Do not add an index when the field does not appear in the where statement

    Only when the where statement appears, MySQL will use the index

  • Do not use indexes for tables with small amount of data

    It has been used, and the improvement is not great

7、 Which SQL can hit the index

  1. Leading fuzzy queries cannot use indexes, such asName like '% Tao'

  2. unioninorThe index can be hit. It is recommended to usein

  3. Negative condition query cannot use index and can be optimized asinQuery, where the negative conditions are!=<>not innot existsnot likeetc.

  4. The leftmost prefix principle of the joint index is also called the leftmost query(a, b, c)If a joint index is established on three fields, it can speed up the processa | (a, b) | (a, b, c)Query speed of three groups.

  5. When establishing a joint query, the field with the highest discrimination is on the left

  6. If established(a,b)Joint index, you don’t have to build a index separately. Similarly, if established(a,b,c)The index doesn’t have to be builta,(a,b)Indexes

  7. When there are mixed judgment conditions of non equal sign and equal sign, the column of equal sign condition should be preceded when building the index

  8. The range column can use the index, but the column after the range column cannot use the index.

The index can be used for one range column at most. If there are two range columns in the query criteria, the index cannot be used completely. The scope conditions are:<<=>>=betweenWait.

  1. Put the calculation in the business layer instead of the database layer. Unable to calculate hit index on field,

  2. Cast type conversion will scan the whole table ifphoneField isvarcharType, the following SQL cannot hit the index.Select \* fromuser where phone=13800001234

  3. It is not suitable to establish an index on the field with frequent updates and low data discrimination.

Updating will change the B + tree, and indexing the frequently updated fields will greatly reduce the database performance.

“Gender” is an attribute with low discrimination. It is meaningless to establish an index. It can not effectively filter data. Its performance is similar to that of full table scanning.

Generally, the index can be established when the discrimination is more than 80%. Discrimination can be usedCount (distinct) (column name)) / count (\ *)To calculate.

  1. Use the overlay index to query and avoid returning to the table.

For the queried column, the data can be obtained from the index rather than through the locatorrow-locatorAgainrowThat is, “the queried column should be overwritten by the index”, which can accelerate the query.

  1. Indexed columns cannot be emptynull, usenot nullConstraints and defaults

  2. Use delay correlation or sub query to optimize the super multi paging scenario,

MySQL is not skippingoffsetOK, but takeoffset+NOK, then give up beforeoffsetLine, returnNOK, then whenoffsetWhen it is very large, the efficiency is very low. Either control the total number of returns or SQL rewrite the pages that exceed a specific threshold.

  1. For fields with unique business characteristics, even the combination of multiple fields, a unique index must be built.

  2. It’s best not to use more than three watchesjoin, needjoinThe fields and data types must be consistent. In multi table Association query, ensure that the associated fields need to have indexes.

  3. If you clearly know that there is only one query result,limit 1It can improve efficiency, such as when verifying login.

  4. The select statement must indicate the field name

  5. If the sorting field does not use an index, sort as little as possible

  6. Try to useunion allreplaceunionunionThe uniqueness filtering operation needs to be performed after the collection is merged, which will involve sorting, a large number of CPU operations, and increase resource consumption and delay. Of course, useunion allThe premise of is that there is no duplicate data between the two result sets.

8、 Summary

Indexing is a very important technology!

However, every time an index is established, a space needs to be opened on the hard disk to store the data structure required by the index (although the expression is not accurate, it means this). Therefore, it is not recommended to establish an index for too long fields.

Moreover, the more indexes are established, the better. Although the index can improve the query efficiency, it will greatly affect the efficiency of insertion, deletion and modification, because each data update will involve the modification of the index.

To sum up, when inserting a large amount of data, we need to delete the index of the data table first and re-establish the index after the insertion, so as to ensure the efficiency of the database to the greatest extent!