MySQL learning notes-3-index


1. Some basic concepts of index

An index is a data structure
Index is to improve the efficiency of data query

Example: dictionary. The initial consonant query method is cluster index. The radical is the secondary index, and the radical + stroke is the joint index.

2. Common index model

Model Table b scene scene
Hashtable Key value (analogy HashMap) Equivalent query Cannot query range
Ordered array Store in order and query by dichotomy Static storage engine High query efficiency and low update efficiency
Search tree The left son of each node is smaller than the parent node, and the parent node is smaller than the right son Equivalence and range query The number is too high, and the disk is read and written too many times
InnoDB uses B + tree index.
InnoDB table structure: 1. In InnoDB, each table is actually multiple B + trees, that is, a primary key index tree and multiple non primary key index trees. 2. Query efficiency: use primary key index > use non primary key index > do not use index. 3. If the index is not used for query, traverse from the leaf node of the main index B + tree.

3. Index type

Indexes Storage mode difference
Primary key index (clustered index) The leaf node stores the entire row of data Just search the B + tree ID to get the data
Non primary key index (secondary index) The leaf node content is the value of the primary key First search the index to get the primary key value, and then search the primary key index tree
In terms of performance and storage space, it is recommended to use self incrementing primary key to ensure that the new ID must be on the rightmost side of the leaf node and will not affect the previous data.

4. B + tree index maintenance

When a data page is full, according to the B + tree algorithm, add a new data page, called page splitting, which will lead to performance degradation. Space utilization is reduced by about 50%. When the utilization of two adjacent data pages is very low, the data pages will be merged. The merging process is the reverse of the splitting process.

You need to review data structures such as binary tree, red black tree and B + tree

5. Overlay index, prefix index and index push down

1. Overwrite index: if the query criteria use the ordinary index (or the leftmost principle field of the joint index), the query result is the field or primary key of the joint index. Instead of returning to the table, the result is returned directly to reduce the reading and writing of positive row data on the IO disk
2. Leftmost prefix: the leftmost n fields of the joint index, or the leftmost M characters of the string index (when MySQL performs lexical analysis and syntax analysis, the syntax tree is established by establishing the leftmost subtree, and the parsing process is also from left to right, so the principle of the leftmost prefix is followed.)
3. Joint index: according to the order in which the joint index is created, the where retrieval is carried out according to the leftmost principle. For example, if (age, name) uses age = 1 or age = 1 and name = ‘Zhang San’, the index can be used, and if only name = ‘Zhang San’, the index will not be used. Considering the problem of storage space, please also create an index on the left for the frequently searched data according to business needs. For example, a joint index (a, B, c) is essentially spliced into a binary byte array in the order of a, B, C. The index records are sorted by byte comparison according to the byte array, so they are sorted by a, then B, and then C. It is obvious why they match according to the leftmost prefix.
4. Index push down: like ‘hello%’ and age > 10. Before MySQL 5.6, the matching data will be queried back to the table. After version 5.6, the data with age < 10 will be filtered out first, and then the table return query will be carried out to reduce the table return rate and improve the retrieval speed.

6. MySQL military regulations

When creating an index for a table, there is no standard answer to the question of which indexes should be created, which fields should be included in each index, and how to arrange the fields. It needs to be weighed according to the specific business. However, some ideas can be used for reference:
1. Since it is a trade-off problem and there is no way to ensure the efficiency of all queries, it is necessary to give priority to ensuring the efficiency of high-frequency queries, and use the leftmost prefix index as long as possible for lower frequency queries. PT query digest can be used to sample and count the access frequency of business query statements. It may take several iterations to determine the final fields of the joint index and their sorting.
2. The business is evolving, so the index also needs to evolve with the business. It is not that everything will be fine once the index is built. When the business changes, we need to re-examine whether the original index is still efficient and can still meet the business needs.
3. There are some MySQL military regulations circulating in the industry. In fact, these are not real military regulations, but best practices in typical scenarios. The real military regulation is actually one: meet business needs efficiently. For example, a military regulation stipulates that the number of indexes on a table should not exceed 5, but if we now have some historical data tables and historical log tables, we clearly know that there will be no more data written on these tables, but our query requirements are many and varied. Can we have more than 5 indexes on these tables? Of course, there is no problem. Of course, we should take a serious look at this military regulation, but the focus is not to remember it, but to understand why each military regulation is so stipulated, what considerations it is based on, and what applicable scenarios and preconditions are. It doesn’t matter whether you remember these military regulations or not, because you have dissolved them into your blood, When it comes to your specific business, it will be inevitable to be able to do it well.

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]