Author: Programmer’s words
1、 Why use auto increment columns as primary keys
1. If we define a primary key If there is no explicit definition of the primary key, InnoDB will select the first unique index without null value as the primary key index. If there is no such unique index, InnoDB will select the built-in 6-byte ROWID as the implied clustered index (the ROWID increases with the write of row records. This ROWID is not like Oracle ROWID is as referential and implicit as ROWID.
2. The data record itself is stored in the leaf node of the main index (a B + tree). This requires each data record in the same leaf node (the size is a memory page or disk page) to be stored in the order of primary key. Therefore, whenever a new record is inserted, MySQL will insert it into the appropriate node and location according to its primary key. If the page reaches the loading factor (InnoDB is 15 / 16 by default), a new page (node) will be created
3. If the table uses an auto increment primary key, each time a new record is inserted, the record will be added to the subsequent position of the current inode in sequence. When a page is full, a new page will be opened automatically
4. If a non auto increment primary key (such as ID number or student number) is used, because the value of the primary key inserted each time is approximately random, each new record must be inserted to a certain position in the middle of the existing index page. At this time, MySQL has to move the data in order to insert the new record to the appropriate location, and even the target page may have been written back to the disk and cleared from the cache At the same time, frequent moves and paging operations have caused a lot of fragmentation, resulting in an index structure that is not compact enough. Subsequently, we have to rebuild the table and optimize the page filling through optimize table.
2、 Why data indexing improves efficiency
1. The storage of data index is orderly
2. In order, it is unnecessary to traverse the index records to query a data through the index
3. In extreme cases, the query efficiency of data index is dichotomy query efficiency, which is close to log2 (n)
3、 The difference between B + tree index and hash index
B + tree is a balanced multi tree, the height difference from the root node to each leaf node is not more than 1, and there are pointers linked to each other in the same level, which is orderly
Hash index is to use a certain hash algorithm to convert the key value into a new hash value. It does not need to search step by step from the root node to the leaf node like the B + tree. It only needs one hash algorithm, which is out of order
4、 Advantages of hash index:
1. Equivalent query. Hash index has an absolute advantage (the premise is: there is no large number of duplicate key values. If there are a large number of duplicate key values, the efficiency of hash index is very low because of the so-called hash collision problem.)
5、 Scenarios where hash index is not applicable:
1. Range queries are not supported
2. Index completion sort is not supported
3. Leftmost prefix matching rules for federated indexes are not supported
Generally, the B + tree index structure is suitable for most scenarios. For the following scenarios, hash indexing is more advantageous:
In the heap table, if the data stored in the heap table has a low repeatability (that is to say, the cardinality is very large), and the data in this column is mainly equivalent query. When there is no range query or sorting, hash index is particularly suitable. For example, this SQL:
Select id, name from table where name ='li Ming '; - equivalent query only
The commonly used InnoDB engine uses the B + tree index by default. It will monitor the usage of the index on the table in real time. If you think that building a hash index can improve the query efficiency, it will automatically build a hash index in the “adaptive hash index buffer” in memory (the adaptive hash index is turned on by default in InnoDB). By observing the search mode, MySQL will use the index The prefix of key is used to build a hash index. If most of a table is in the buffer pool, then building a hash index can speed up the equivalent query.
Note: under some workloads, the performance improvement brought by hash index lookup is much greater than the additional cost of monitoring index search and maintaining the hash table structure. But sometimes, under high load, the read / write lock added in the adaptive hash index will also bring competition, such as high concurrency join operation. The like operation and the% wildcard operation are also not applicable to the adaptive hash index. You may want to turn off the adaptive hash index.
6、 The difference between B-tree and B + tree
1. B tree, each node stores key and data, all nodes constitute this tree, and leaf node pointer is nul, leaf node does not contain any keyword information.
2. In B + tree, all leaf nodes contain the information of all keywords and the pointer to the records containing these keywords, and the leaf nodes themselves are linked in the order from small to large. All non terminal nodes can be regarded as index parts, and only the largest (or smallest) keyword in the root node of its subtree is contained in the node. (the non terminal nodes of B-tree also contain valid information to be searched)
7、 Why is B + more suitable for file index and database index of operating system than B tree?
1. The disk read and write cost of B + is lower, and the internal node of B + does not point to the specific information of key words. Therefore, its internal nodes are smaller than B-tree. If all the keywords of the same internal node are stored in the same disk, the more keywords the disk can hold. The more keywords that need to be searched are read into memory at one time. Relatively speaking, the number of IO reads and writes is reduced.
2. The query efficiency of B + – tree is more stable because the non endpoint is not the node that ultimately points to the file content, but only the keyword index in the leaf node. Therefore, any keyword search must take a path from root node to leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.
8、 MySQL federated index
1. A federated index is an index on two or more columns. For federated index: MySQL uses the fields in the index from left to right. A query can only use a part of the index, but only the leftmost part. For example, the index is key index (a, B, c). Three combinations of a, a, B, a, B, C are supported for searching, but B and C are not supported for searching. When the leftmost field is a constant reference, the index is very effective.
2. You can narrow your search with additional columns in the index, but using an index with two columns is different from using two separate indexes. The structure of the composite index is similar to that of a phone book. The first name of a person is composed of a first name and a first name. The phone book first sorts people with the same surname by first name, and then by first name. If you know the last name, the phone book will be very useful; if you know the last name and first name, the phone book will be more useful, but if you only know the first name but not the last name, the phone book will not be useful.
9、 What should be done to build indexes
1. There are too few records in the table
2. Frequently inserted, deleted and modified tables
3. If a table has 100000 rows of records, one field a has only two values of T and F, and the distribution probability of each value is about 50%, then building an index for this table a field will not improve the query speed of the database.
4. Table fields that are often queried together with the main field but have more index values in the primary field
10、 What is a table partition?
Table partition refers to the decomposition of a table in a database into smaller and manageable parts according to certain rules. Logically, there is only one table, but the underlying layer is composed of multiple physical partitions.
11、 The difference between table partition and table partition
Sub table: it refers to decomposing a table into several different tables through certain rules. For example, the user’s order records are divided into multiple tables according to the time.
The difference between sub table and partition is that partition logically has only one table, while sub table decomposes a table into multiple tables.
12、 What are the benefits of table partitioning?
1. The data of partition table can be distributed on different physical devices, which can efficiently utilize multiple hardware devices. 2. It can store more data than a single disk or file system
2. Optimize the query. When partition conditions are included in the where statement, only one or more partition tables can be scanned to improve query efficiency; when sum and count statements are involved, parallel processing can be performed on multiple partitions, and finally the results are summarized.
3. Partition tables are easier to maintain. For example, if you want to delete a large amount of data in bulk, you can clear the entire partition.
4. Partition table can be used to avoid some special bottlenecks, such as exclusive access of single index of InnoDB, inode lock competition of your system when ext3 asks price.
13、 Restrictions on partition table
1. A table can have up to 1024 partitions
2. In MySQL 5.1, the partition expression must be an integer or an expression that returns an integer. Support for non integer expression partitioning is provided in MySQL 5.5.
3. If there are columns with primary key or unique index in the partition field, then many columns with primary key and unique index must be included. That is, partition fields either contain no primary key or index columns, or contain all primary key and index columns.
4. Foreign key constraints cannot be used in partitioned tables
5. MySQL partition is applicable to all the data and indexes of a table. It can not partition only the data of the table but not the index, the index instead of the table, or only a part of the data of the table.
14、 How to determine whether MySQL supports partition?
Command: Show variables like ‘partition%’ run result:
mysql> show variables like '%partition%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | have_partitioning | YES | +-------------------+-------+ 1 row in set (0.00 sec)
have_ The value of partitioning is yes, indicating that partitioning is supported.
15、 What partition types does MySQL support?
1. Range partitioning: this mode allows data to be divided into different ranges. For example, you can divide a table into partitions by year
2. List partitioning: this mode allows the system to segment data by predefined list values. According to the value partition in the list, the difference between range partition and range is that the interval range value of range partition is continuous.
3. Hash partition: this mode allows to calculate the hash key of one or more columns of the table, and finally partition the data area corresponding to different values of the hash code. For example, you can create a table that partitions the primary key of a table.
4. Key partition: an extension of the above hash mode. The hash key here is generated by the MySQL system.
16、 Four isolation levels
1. Serializable: it can avoid dirty read, non repeatable read and unreal read.
2. Repeatable read: it can avoid the occurrence of dirty read and non repeatable read.
3. Read committed: to avoid dirty reads.
4. Read uncommitted: the lowest level, which can not be guaranteed under any circumstances.
17、 About mvvc
MySQL InnoDB storage engine implements mvcc (multi version concurrency control) based on multi version. The biggest advantage of mvcc: no lock on read, no conflict between read and write. In OLTP applications with read more and write less, read write conflict is very important, which greatly increases the concurrent performance of the system. At present, almost all RDBMS support mvcc.
1. Lbcc: lock based concurrency control.
2. Mvcc: multi version concurrency control, based on multi version concurrency control protocol. Mvcc is an improvement on lock based concurrency control, mainly in read operation.
18、 In mvcc concurrency control, read operations can be divided into two types
1. Snapshot read: it reads the visible version of the record (possibly the historical version), and does not need to be locked (shared read lock s is not locked, so it will not block the writing of other transactions).
2. Current read: the latest version of the record is read, and the records returned by the current read will be locked to ensure that other transactions will not modify this record concurrently.
19、 Advantages of row level locking:
1. There are only a few lock conflicts when accessing different rows in many threads.
2. There are only a few changes when rolling back
3. You can lock a single row for a long time.
20、 Disadvantages of row level locking:
1. Uses more memory than page level or table level locking.
2. When used in most of the table, it is slower than page level or table level locking because you have to acquire more locks.
3. If you frequently perform group by operations on most of the data or have to scan the entire table frequently, it is significantly slower than other locking.
4. With high-level locking, you can also easily adjust the application by supporting different types of locking, because the cost of locking is less than that of row level locking.
21、 MySQL optimization
1. Open query cache to optimize query
2. Explain your select query, which can help you analyze the performance bottleneck of your query statement or table structure. The query results of explain will also tell you how your index primary key is used, and how your data table is searched and sorted
3. When limit 1 is used when only one row of data is used, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to search for the next data that matches the record
4. Index search fields
5. Use enum instead of varchar. If you have a field such as “gender”, “country”, “nationality”, “status” or “department”, and you know that the values of these fields are limited and fixed, you should use enum instead of varchar.
6. Like stored procedures, prepared statements is a collection of SQL statements running in the background. We can get many benefits from using prepared statements, whether it is performance or security issues. Prepared statements can check some variables that you have bound to protect your program from SQL injection attacks
7. Vertical sub table
8. Choose the right storage engine
22、 The difference between key and index
1. Key is the physical structure of a database, which contains two levels of meaning and function. One is constraint (which focuses on constraining and standardizing the structural integrity of the database), and the other is index (for auxiliary query). Including primary key, unique key, foreign key, etc
2. Index is the physical structure of the database. It is only used for auxiliary query. When it is created, it will be stored in another table space (InnoDB table space in MySQL) in a directory like structure. Index can be classified into prefix index, full-text index, etc;
23、 What are the differences between MyISAM and InnoDB in MySQL?
1. InnoDB supports transactions, but MyISAM does not. For InnoDB, each SQL language is encapsulated into a transaction by default, which will affect the speed. Therefore, it is better to put multiple SQL languages between begin and commit to form a transaction;
2. InnoDB supports foreign keys, while MyISAM does not. Converting an InnoDB table containing a foreign key to MyISAM fails;
3. InnoDB is a clustered index. The data file is bound to the index. It must have a primary key. The efficiency of indexing through the primary key is very high. However, the secondary index needs to be queried twice. First, the primary key is queried, and then the data is queried through the primary key. Therefore, the primary key should not be too large, because if the primary key is too large, other indexes will also be large. MyISAM is a nonclustered index, the data file is separated, and the index stores the pointer of the data file. The primary and secondary indexes are independent.
4. InnoDB does not save the specific number of rows in the table. When you execute select count (*) from table, a full table scan is required. MyISAM uses a variable to save the number of rows in the whole table. When executing the above statement, you only need to read the variable, which is very fast;
5. InnoDB does not support full-text index, while MyISAM supports full-text index, so MyISAM has higher query efficiency;
How to choose:
1. Do you want to support transactions? If so, please select InnoDB. If not, consider MyISAM;
2. If most of the tables are only read queries, you can consider MyISAM. If you have both read and write frequently, please use InnoDB.
3. After the system crashes, MyISAM is more difficult to recover and can it be accepted;
4. Since MySQL version 5.5, InnoDB has become the default engine of MySQL (previously MyISAM), which shows that its advantages are obvious to all. If you don’t know what to use, use InnoDB, at least not bad.
24、 Notes on database table creation
1. Field name and rationality of field configuration
- Remove the fields that are not closely related;
- Field naming should have rules and corresponding meanings (not part of English, part of Pinyin, and some fields with unknown meaning like a.b.c.);
- Try not to use abbreviations for field naming (most abbreviations can’t define the meaning of the field);
- Do not mix case and case in the field (if you want to be readable, multiple English words can be connected in the form of underline);
- Do not use reserved words or keywords for field names;
- Keep the consistency of field name and type;
- Choose the number type carefully;
- Leave enough margin for text fields;
2. Processing of special fields in the system and suggestions after completion
- Add deletion marks (such as operator, deletion time);
- Build version mechanism;
3. Rational configuration of table structure
- The processing of multi type fields is whether there are fields in the table that can be decomposed into smaller independent parts (for example, people can be divided into men and women);
- Multi value field processing, can be divided into three tables, so that the retrieval and sorting more conditioning, and ensure the integrity of the data!
4. Other suggestions
- For big data fields, independent tables are stored to affect performance (for example, introduction fields);
- Use varchar type instead of char, because varchar will dynamically allocate length, and char specifies that the length is fixed;
- Creating a primary key for a table has a certain impact on query and index definition for a table without a primary key;
- It is recommended to set the default value (for example, set the default value of int type to 0) in the index query, so that the efficiency is obvious;
- The best way to build an index is to set up a unique and non empty field. Too many indexes will affect the later insert and update (consider the actual situation);