Knowledge points that must be mastered about database index

Time:2021-11-30

MySQL index is a very important knowledge point of database. Have you mastered all these knowledge points? If you can help, you can like the collection.

Recommended readingHave you mastered all these necessary computer network knowledge points

What is an index?

Baidu Encyclopedia’s explanation: an index is a structure that sorts the values of one or more columns of a database table. Using an index can quickly access specific information in a data table.

Advantages and disadvantages of indexing?

advantage:

  • Greatly accelerate the speed of data retrieval.
  • Turn random I / O into sequential I / O (because the leaf nodes of the B + tree are connected together)
  • Accelerometer to accelerometer connection

Disadvantages:

  • From a spatial perspective, indexing requires physical space
  • From the perspective of time, it takes time to create and maintain the index. For example, it is necessary to maintain the index when adding, deleting and modifying data.

Index data structure?

The data structure of the index mainly includes B + tree and hash table, and the corresponding indexes are B + tree index and hash index respectively. The index types of InnoDB engine include B + tree index and hash index. The default index type is B + tree index.

  • B + tree index

    Students familiar with data structures know that B + tree, balanced binary tree and red black tree are classical data structures. In the B + tree, all record nodes are placed on leaf nodes in the order of key values, as shown in the following figure.

在这里插入图片描述

As can be seen from the above figure, because the B + tree is orderly and all data is stored in the leaf node, the search efficiency is very high, and it supports sorting and range search.

The index of B + tree can be divided into primary index and secondary index. The primary index is clustered index and the secondary index is non clustered index. Cluster index is a B + tree index composed of primary key as the key value of B + tree index. The leaf node of cluster index stores complete data records; Non clustered index is a B + tree index composed of non primary key columns as the key value of B + tree index. The leaf node of non clustered index stores the primary key value. Therefore, when querying with a non clustered index, the primary key value will be found first, and then the data field corresponding to the primary key will be found according to the clustered index. In the figure above, the leaf node stores data records, which is the structure diagram of clustered index, and the structure diagram of non clustered index is as follows:

在这里插入图片描述

The letters in the figure above are the non primary key column values of the data. If you want to query the information with the column value of B, you need to find the primary key 7 first and query the data field corresponding to the primary key 7 in the cluster index.

  • Hash indices

    The hash index is implemented based on the hash table. For each row of data, the storage engine will hash the index column through the hash algorithm to obtain the hash code, and the hash algorithm should try to ensure that the hash code values calculated by different column values are different. Take the hash code value as the key value of the hash table and the pointer to the data row as the value value of the hash table. In this way, the time complexity of finding a data is O (1), which is generally used for accurate finding.

What is the difference between hash index and B + tree?

Because of the differences in their data structures, their usage scenarios are also different. Hash index is generally used for accurate equivalent search, and B + index is mostly used for other searches except accurate equivalent search. In most cases, you will choose to use B + tree index.

  • The hash index does not support sorting because the hash table is unordered.
  • Hash index does not support range lookup.
  • Hash index does not support fuzzy query and leftmost prefix matching of multi column index.
  • Because there will be hash conflicts in the hash table, the performance of the hash index is unstable, while the performance of the B + tree index is relatively stable. Each query is from the root node to the leaf node

What are the types of indexes?

The main index types of MySQL include fulltext, hash, BTREE and rtree.

  • FULLTEXT

    Fulltext means full-text indexing. MyISAM storage engine and InnoDB storage engine support full-text indexing in versions above MySQL 5.6.4. They are generally used to find keywords in text rather than directly compare whether they are equal. They mostly create full-text indexes on data types such as char, varchar and tax. Full text indexing is mainly used to solve the problem of low efficiency of fuzzy query for text, such as where name like “% Zhang%”.

  • HASH

    Hash is hash index. Hash index is mostly used for equivalent query. The time complexity is O (1), which is very efficient, but it does not support sorting, range query and fuzzy query.

  • BTREE

    BTREE is the B + tree index, which is the default index of InnoDB storage engine. It supports sorting, grouping, range query, fuzzy query, etc., and has stable performance.

  • RTREE

    Rtree is spatial data index, which is mostly used for the storage of geographic data. Compared with other indexes, the advantage of spatial data index lies in range search

What are the types of indexes?

  • Primary key index: duplicate data columns are not allowed and cannot be null. A table can only have one primary key index
  • Composite index: an index composed of multiple column values.
  • Unique index: the data column cannot be duplicated and can be null. The value of the index column must be unique. If it is a combined index, the combination of column values must be unique.
  • Full text index: search the content of text.
  • Normal index: the basic index type, which can be null

What is the difference between B tree and B + tree?

There are two main differences between B tree and B + tree:

  • The internal nodes and leaf nodes in the B tree store keys and values, while the internal nodes of the B + tree only have keys and no values, and the leaf nodes store all keys and values.

  • The leaf nodes of B + tree are connected together to facilitate sequential retrieval.

    The structure diagram of the two is as follows.

    在这里插入图片描述

    在这里插入图片描述

Why does the database use a B + tree instead of a B tree?

  • B tree is suitable for random retrieval, while B + tree is suitable for random retrieval and sequential retrieval
  • The space utilization of B + tree is higher, because each node of B + tree needs to store keys and values, while the internal nodes of B + tree only store keys, so that one node of B + tree can store more indexes, so as to reduce the height of the tree, reduce I / O times and make data retrieval faster.
  • The leaf nodes of the B + tree are connected together, so the range search and order search are more convenient
  • The performance of the B + tree is more stable, because in the B + tree, each query is from the root node to the leaf node, while in the B tree, the value to be queried may not be in the leaf node, but has been found in the internal node.

In what situation is it suitable to use the B tree? Because the internal nodes of the B tree can also store values, some frequently accessed values can be placed close to the root node, which can improve the query efficiency. To sum up, the performance of B + tree is more suitable for database indexing.

What is a clustered index and what is a non clustered index?

The main difference between clustered index and non clustered index isAre data and indexes stored separately

  • Clustered index: the data and index are stored together, and the leaf node of the index structure retains the data rows.
  • Non clustered index: the data entry and index are stored separately. The index leaf node stores the address pointing to the data row.

In the InnoDB storage engine, the default index is the B + tree index. The index created using the primary key is the primary index and also a clustered index. The index created above the primary index is a secondary index and also a non clustered index. Why is the secondary index created on top of the primary index? Because the leaf node in the secondary index stores the primary key.

In the MyISAM storage engine, the default index is also a B + tree index, but both the primary index and the secondary index are non clustered indexes, that is, the leaf node of the index structure stores an address pointing to a data row. And secondary indexes are used to retrieve indexes that do not need to access the primary key.

You can see the difference between the two classic pictures (the pictures come from the Internet):

在这里插入图片描述

在这里插入图片描述

Do non clustered indexes have to be queried back to the table?

As mentioned above, the leaf node of the non clustered index stores the primary key, that is, first find the primary key through the non clustered index, and then find the data corresponding to the primary key through the clustered index. Later, the process of finding the data corresponding to the primary key through the clustered index is a back table query. Will the non clustered index necessarily perform a back table query?

The answer is not necessarily. This involves an index coverage problem. If the queried data can be obtained on the auxiliary index, there is no need to query back to the table. For example, a table stores personal information, including ID, name, age and other fields. Suppose that the clustered index is an index built with ID as the key value, and the non clustered index is an index built with name as the key value,select id,name from user where name = 'zhangsan';This query does not require a back to table query, because all data can be retrieved through the non clustered index, which is the case of index coverage. If the query statement is like this,select id,name,age from user where name = 'zhangsan';You need to query back to the table because the value of age cannot be retrieved through the non clustered index. How to solve that problem? You only need to overwrite the index, and create a joint index of age and name for reuseselect id,name,age from user where name = 'zhangsan';Just query.

Therefore, index coverage can solve the problem of non clustered index back to table query.

What are the usage scenarios for indexing?

  • Indexing is very effective for medium and large tables. For very small tables, the scanning speed of all tables is generally faster.
  • For very large tables, the cost of establishing and maintaining indexes will also become higher. At this time, partition technology can be considered.
  • If there are many additions, deletions and changes to the table and there are very few query requirements, there is no need to establish an index, because maintaining the index also requires a price.
  • Generally, there will be no fields in the where condition, so there is no need to establish an index.
  • If multiple fields are often queried, joint index can be considered.
  • When there are many fields and there are no duplicate field values, the unique index is considered.
  • When there are many fields and there are duplicates, consider the general index.

Design principles of index?

  • The most suitable column for indexing is the column that appears after where or the column specified in the join sentence, not the column that appears in the selection list after the select keyword.
  • The larger the cardinality of the index column, the better the index effect. In other words, the higher the discrimination of the index column, the better the index effect. For example, using a column with low discrimination such as gender as an index will have a poor effect, because the cardinality of the column is up to three, most of which are either men or women.
  • Try to use short indexes. When indexing longer strings, you should specify a shorter prefix length, because smaller indexes involve less disk I / O, and blocks in the index cache can accommodate more key values, which will make the query faster.
  • Try to use the leftmost prefix.
  • Do not over index. Each index requires additional physical space and maintenance also takes time, so the more indexes, the better.

How to optimize the index?

In fact, the key to index optimization is to comply with the index design principles and application scenarios, and optimize the indexes that do not meet the requirements into indexes that comply with the index design principles and application scenarios.

In addition to the design principles and application scenarios of the index, the following two aspects can also be considered.

  • When querying, the index column cannot be part of an expression or an argument to a function because the index cannot be used. for exampleselect * from table_name where a + 1 = 2
  • Put the most differentiated index first
  • Use select as little as possible*

The usage scenario of the index, the design principle of the index and how to optimize the index can be regarded as a problem.

How do I create / delete an index?

Create index:

  • Using the create index statement

    CREATE INDEX index_name ON table_name (column_list);

  • Create when creating table

    CREATE TABLE user(
    	id INT PRIMARY KEY,
    	information text,
    	FULLTEXT KEY (information)
    );
  • Create index using alter table

    ALTER TABLE table_name ADD INDEX index_name (column_list);

Delete index:

  • Delete primary key index

    Alter table name DROP primary key

  • Delete other indexes

    Alter table name DROP key index name

Is performance bound to improve when using index queries?

Not necessarily. How to use the index reasonably has been mentioned in the index usage scenarios and index design principles, because creating and maintaining the index requires a cost of space and time. If the index is not used reasonably, the query performance will be degraded.

What is a prefix index?

Prefix index refers to indexing the first few characters of text or string, so that the index length is shorter and the query speed is faster.

Usage scenario: when the prefix discrimination is high.

How to establish prefix index

ALTER TABLE table_name ADD KEY(column_name(prefix_length));

There’s a prefix in here_ It is difficult to determine the length parameter, which means prefix length. Generally, the following methods can be used for determination. First, calculate the discrimination of the whole column

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

Then, when calculating the prefix length, it is most similar to the discrimination of the whole column.

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

Constantly adjust prefix_ The value of length until it is close to the discrimination calculated for the whole column.

What is the leftmost matching principle?

Leftmost matching principle: start continuous matching from the leftmost side, and stop matching in case of range query (<, >, between, like).

For example, when building indexes (a, B, c), you can guess whether indexes are used in the following situations.

  • First kind

    select * from table_name where a = 1 and b = 2 and c = 3 
    select * from table_name where b = 2 and a = 1 and c = 3

    In the above two queries, all values use the index. The field exchange after where will not affect the query results, because the optimizer in MySQL will automatically optimize the query order.

  • Second

    select * from table_name where a = 1
    select * from table_name where a = 1 and b = 2  
    select * from table_name where a = 1 and b = 2 and c = 3

    The answer is that the index is used in all three query statements, because all three statements match from the leftmost.

  • Third

    select * from table_name where  b = 1 
    select * from table_name where  b = 1 and c = 2

    The answer is that the index is not used in both query statements, because the matching does not start from the far left

  • Fourth

    select * from table_name where a = 1 and c = 2

    In this query statement, only column a uses the index, and column C does not use the index, because column B is skipped in the middle, which is not matched continuously from the leftmost.

  • Fifth

    select * from table_name where  a = 1 and b < 3 and c < 1

    In this query, only column A and column B use the index, while column C does not use the index, because according to the leftmost matching query principle, the query will stop when encountering the range.

  • Sixth

    select * from table_name where a like 'ab%'; 
    select * from table_name where  a like '%ab'
    select * from table_name where  a like '%ab%'

    When the column is a string, only prefix matching can use the index, and infix matching and suffix matching can only scan the whole table.

Under what circumstances will indexes fail?

In the above description, several cases that do not comply with the leftmost matching principle will lead to index failure. In addition, the following cases will also lead to index failure.

  • Or in the condition, for exampleselect * from table_name where a = 1 or b = 3
  • Calculating on an index will invalidate the index, for exampleselect * from table_name where a + 1 = 2
  • Implicit conversion of data type on the index type will lead to index invalidation. For example, the string must be quoted. Supposeselect * from table_name where a = '1' The index will be used if written asselect * from table_name where a = 1 Will cause the index to fail.
  • Using functions in an index will invalidate the index, for exampleselect * from table_name where abs(a) = 1
  • Using like queries starting with% will invalidate the index
  • Use on index!, =, < > When making a judgment, the index will be invalidated, for exampleselect * from table_name where a != 1
  • The use of is null / is not null judgment on the index field will lead to index invalidation, for exampleselect * from table_name where a is null

Recommended Today

Golden nine silver ten: byte three-sided warp at the front end of a year

background Recently, I also wrote an article in naked CIGolden nine silver ten: Interview sharing at the front of the year, after many days,Finally summoned up the courage to resume the interview, the reason why the byte face is written separately,Because you really need to know some big factory interview experience in advance, the problem […]