Database index summary (2)

Time:2021-4-21

What is an index?

Index is a kind of data structure used to query and retrieve data quickly. Common index structures are: B-tree, B + tree and hash.

The function of index is equivalent to that of catalog. For example: when we look up a dictionary, if there is no directory, we can only find the word we need to look up page by page, which is very slow. If there is a directory, we just need to go to the directory to find the location of the word, and then directly turn to that page.

Why use index? Analysis of the advantages and disadvantages of index

Advantages of indexing

It can greatly speed up the speed of data retrieval (greatly reduce the amount of data retrieved), which is also the main reason for creating an index. After all, most systems always have larger read requests than write requests.In addition, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.

Disadvantages of index

  1. It takes a lot of time to create and maintain indexes: when adding, deleting and modifying the data in the table, if the data has an index, then the index also needs to be dynamically modified, which will reduce the efficiency of SQL execution.
  2. Occupying physical storage space  : the index needs to use physical file storage, which also consumes a certain amount of space.

The difference between B tree and B + tree

  • All nodes in B tree store both key and data, while only leaf nodes store key and data in B + tree, and other inner nodes store only key.
  • The leaf nodes of B tree are all independent; the leaf nodes of B + tree have a reference chain pointing to its adjacent leaf nodes.
  • The process of B-tree search is equivalent to binary search for the keywords of each node in the scope. The search may be finished before reaching the leaf node. The efficiency of B + tree retrieval is very stable. Any search is a process from root node to leaf node, and the order of leaf node retrieval is obvious.
    Database index summary (2)

Hash index and B + Analysis on the advantages and disadvantages of tree index

Fast positioning of hash index

Hash index refers to the hash table. Its biggest advantage is that it can locate the data in a very short time according to the hash function, which can’t be compared with B + tree.

Hash conflict problem

Students who know HashMap or hashtable all know that their biggest disadvantage is hash conflict. However, for the database, this is not the biggest disadvantage.

Hash index does not support order and range queries (the biggest disadvantage of hash index is that it does not support order and range queries.

Imagine a situation

`SELECT * FROM tb1 WHERE id < 500;` 

*   1

B + tree is ordered. In this kind of range query, it has great advantages. It is enough to directly traverse leaf nodes smaller than 500. Hash index is based on hash algorithm to locate. Is it difficult to locate the 1 – 499 data by hash calculation? This is the biggest disadvantage of hash.


Index type

Primary key

The primary key column of a data table uses the primary key index.

A data table can only have one primary key, and the primary key cannot be null or duplicate.

In the InnoDB table of MySQL, when there is no primary key of the specified table to be displayed, InnoDB will automatically check whether there is a field with unique index in the table. If there is, select the field as the default primary key, otherwise InnoDB will automatically create a 6byte self incrementing primary key.

Secondary index ( Secondary index )

Secondary index is also called secondary index because the data stored in the leaf node of secondary index is primary key. In other words, through the secondary index, you can locate the location of the primary key.

Unique index, common index and prefix index belong to secondary index.

PS : Do not understand the students can temporarily doubt, slowly look down, there will be answers behind, you can also search.

  1. unique index ( Unique Key ): a unique index is also a constraint.Duplicate data is not allowed in the attribute column of unique index, but null data is allowed. Multiple unique indexes can be created in one table.The purpose of establishing a unique index is mostly for the uniqueness of the data of the attribute column, not for the query efficiency.
  2. General index :The only function of common index is to query data quickly. A table allows to create multiple common indexes, and allows data repetition and null.
  3. Prefix index ( Prefix ): prefix index applies only to data of string type. Prefix index is to index the first few characters of the text, which is smaller than ordinary index because it only takes the first few characters.
  4. Full text index ( Full Text )Full text index is mainly used to retrieve the information of keywords in big text data, which is a technology used in search engine database. Before mysql5.6, only MyISAM engine supported full-text indexing, and after 5.6, InnoDB also supported full-text indexing.

Secondary index :Database index summary (2)

Clustered index and nonclustered index

Clustered index

A clustered index is an index in which the index structure and data are stored together. The primary key index is a clustered index.

In mysql, the table of InnoDB engine.ibdThe file contains the index and data of the table. For the InnoDB engine table, the index of the table ( B + tree ) Each of the non leaf nodes stores the index, and the leaf node stores the index and the data corresponding to the index.

Advantages of clustered index

The query speed of clustered index is very fast, because the whole B + The tree itself is a multi fork balanced tree, and the leaf nodes are all orderly. Locating the nodes in the index is equivalent to locating the data.

Disadvantages of clustered index

  1. Rely on ordered data: because the B + tree is a multi-channel balanced tree, if the indexed data is not in order, it needs to be sorted at insertion time. If the data is integer, it is OK. Otherwise, the insertion or search speed of long and difficult data like string or UUID will be slow.
  2. Expensive to update: if the data of the index column is modified, the corresponding index will also be modified. Moreover, the leaf node of the clustered index still stores data, so the modification cost is certainly high. Therefore, for the primary key index, the primary key is generally not modifiable.

Nonclustered index

Nonclustered index is an index in which index structure and data are stored separately.

Secondary indexes are nonclustered.

The. MYI file of the table of the MyISAM engine contains the index of the table. Each leaf node of the index (B + tree) of the table stores the index. The leaf node stores the index and the pointer to the corresponding data of the index, pointing to the data of the. MyD file.

The leaf node of a nonclustered index does not necessarily store a pointer to the data, because the leaf node of a secondary index stores the primary key, and then returns to the table to look up the data according to the primary key.

Advantages of nonclustered indexes

Update cost is less than clustered index 。 The update cost of nonclustered index is less than that of clustered index. The leaf node of nonclustered index does not store data

Disadvantages of nonclustered index

  1. Like clustered indexes, nonclustered indexes rely on ordered data
  2. There may be a second query (back to table): this should be the biggest drawback of nonclustered indexes. After finding the pointer or primary key corresponding to the index, you may need to query the data file or table according to the pointer or primary key.

This is a screenshot of the table in MySQL
Database index summary (2)

Clustered index and nonclustered index :
Database index summary (2)

Must nonclustered indexes return to table queries ( Overlay index )?

Nonclustered indexes do not necessarily return to table queries.

Imagine a situation where the user is going to query the user name using SQL, and the user name field is just indexed.

 `SELECT name FROM table WHERE username='guang19';` 

*   1

Then the key of the index itself is name. If you find the corresponding name, you can return it directly. There is no need to return to the table for query.

This is true even for MyISAM, although the primary key index of MyISAM does need to return to the table, because the leaf node of its primary key index stores pointers. But what if SQL looks up the primary key?

`SELECT id FROM table WHERE id=1;` 

*   1

The key of the primary key index itself is the primary key. If you find it, you can return it. This situation is called overlay index.

Overlay index

If an index contains (or covers) the values of all the fields that need to be queried, we call it “coverage index”. We know that in InnoDB storage engine, if it is not the primary key index, the leaf node stores the primary key + column value. Finally, we need to “return to the table”, that is, we need to find it again through the primary key. In this way, it will be relatively slow. To overlay the index is to find out the corresponding column and index, and do not do the operation back to the table!

Overlay index means that the field to be queried is exactly the field of the index, so the data can be found directly according to the index without back table query.

For example, if a SQL query needs to query the primary key, the primary key can be found just according to the primary key index.

Another example is the ordinary index. If a SQL query needs to query the name field, and the name field has an index, then the data can be found directly according to the index, and there is no need to return to the table.

Overlay index:Database index summary (2)


Index creation principles

single column

A single column index is an index composed of a column of attributes.

Union index (multi column index)

Joint index is composed of multi column attributes.

Leftmost prefix principle

Suppose you create a union index consisting of three fields :

`ALTER TABLE table ADD INDEX index_name (num,name,age)` 

*   1

Then, when the query conditions are promising : num / ( num AND name ) / ( num AND name AND age ) The index takes effect. Therefore, when creating a union index, try to take the field with the most frequent queries as the leftmost field ( first ) Field. This field should be the first condition when querying.

However, due to the version (my version of MySQL is 8.0. X), the union index I created is equivalent to creating the same index on each field of the union index
Database index summary (2)

The index of each field takes effect regardless of whether it conforms to the leftmost prefix principle
Database index summary (2)

Notes on index creation

Leftmost prefix principle

Although my current version of MySQL is relatively high, it seems that I don’t follow the leftmost prefix principle, and the index will take effect. But we should still abide by the leftmost prefix principle to avoid the trouble caused by version change.

Select the appropriate field

1. Fields that are not null

The data of index field should not be null as far as possible, because it is difficult to optimize the database for the field whose data is null. If the field is frequently queried but cannot avoid being null, it is recommended to use short values or short characters with clear semantics such as 0, 1, true and false as alternatives.

2. Frequently queried fields

The fields we create the index should be the fields with frequent query operations.

3. Fields to be queried as conditions

Fields that are queried as where criteria should be considered for indexing.

four . Fields that are often used to connect

The fields often used for connection may be some foreign key columns. For foreign key columns, it is not necessary to establish a foreign key. It is only said that the column involves the relationship between tables. For the fields that are frequently connected to query, we can consider building an index to improve the efficiency of multi table connection query.

Fields not suitable for index creation

1. Frequently updated fields should be carefully indexed

Although index can bring query efficiency, the cost of maintaining index is not small. If a field is not frequently queried, but is frequently modified, it is even less necessary to establish an index on this field.

2. It is not necessary to index the fields that are not frequently queried

3. Consider establishing a joint index instead of a single column index as much as possible

Because indexes need to occupy disk space, it can be simply understood that each index corresponds to a B + tree. If there are too many fields and indexes in a table, then when the data of the table reaches a volume, the index takes up a lot of space, and it takes a lot of time to modify the index. If it is a joint index, multiple fields in one index, it will save a lot of disk space, and the operation efficiency of modifying data will also be improved.

4. Avoid redundant index

Redundant index means that the functions of indexes are the same, and if they can be hit, they will be hit. Then, redundant indexes such as (name, city) and (name) are redundant indexes. Queries that can hit the latter will definitely hit the former. In most cases, existing indexes should be expanded as far as possible instead of creating new ones.

five . Consider using prefix index instead of normal index on string type fields

Prefix index is only limited to string type, which will occupy less space than ordinary index, so we can consider using prefix index band instead of ordinary index.

Can index improve query performance?

In most cases, index queries are faster than full table scans. However, if the amount of data in the database is small, then the use of index may not be able to bring great improvement.

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]