Recommendation of MySQL high performance optimization specification

Time:2020-8-14

A good life is not static, but dynamic. It is not an end, but a direction.

Database command specification

  • All database object names must be lowercase and delimited with underscores
  • MySQL reserved keywords are prohibited for all database object names (if keyword queries are included in table names, they need to be enclosed in single quotation marks)
  • The naming of the database object should be able to recognize the meaning of the name and should not exceed 32 characters
  • Temporary library table must use TMP_ The backup table must be prefixed with Bak and suffixed with date_ Is prefixed with a date (timestamp) suffix
  • All column names and column types that store the same data must be consistent (generally, as an associated column, if the associated column type is inconsistent during query, the data type will be implicitly converted automatically, which will cause the index on the column to be invalid and the query efficiency will be reduced)

Basic design specification for database

1. All tables must use InnoDB storage engine

In the case of using InnoDB (mysql.5 by default), all storage requirements of InnoDB (such as MySQL 5.5) cannot be used.

InnoDB supports transactions, supports row level locks, and has better recoverability and higher concurrent delivery performance.

2. The character set of database and table should use utf8

It has better compatibility. The unified character set can avoid the garbled code caused by character set conversion. If different character sets need to be converted before comparison, the index will be invalid. If Emoji expression needs to be stored in the database, the character set needs to use utf8mb4 character set.

3. All tables and fields need to be annotated

Use comment clause to add notes for tables and columns, and maintain the data dictionary from the beginning

4. Try to control the size of single table data, and it is recommended to control within 5 million.

5 million is not the limit of MySQL database. If it is too large, there will be great problems in modifying the table structure, backup and recovery.

Historical data archiving (applied to log data), sub database and sub table (applied to business data) can be used to control the amount of data

5. Use MySQL partition table carefully

Partition table is physically represented as multiple files and logically as a table;

If partition key is selected carefully, cross partition query efficiency may be lower;

It is suggested that big data should be managed by physical tables.

6. Try to separate the hot and cold data and reduce the width of the table

MySQL limits the storage of 4096 columns per table, and the size of each row cannot exceed 65535 bytes.

Reduce disk IO to ensure the memory cache hit rate of hot data (the wider the table, the larger the memory occupied when loading the table into the memory buffer pool, and more IO will be consumed);

More effective use of cache to avoid reading useless cold data;

Columns that are often used together are placed in a table (to avoid more association operations).

7. It is forbidden to create reserved fields in the table

It is difficult to identify the meaning of the reserved field by its name.

The reserved field cannot confirm the stored data type, so the appropriate type cannot be selected.

Changes to the reserved field type will lock the table.

8. It is forbidden to store big binary data such as pictures and files in the database

Usually the file is very large, which will cause the rapid growth of data volume in a short time. When the database reads the database, a large number of random IO operations will be carried out. When the file is large, the IO operation is very time-consuming.

Usually stored in the file server, the database only stores the file address information

9. It is forbidden to do database stress test online

10. It is forbidden to connect directly from development environment and test environment to generate environment database


Database field design specification

1. Give priority to the smallest data type that meets the storage needs

reason:

The larger the column field, the larger the space needed to establish the index. In this way, the fewer and fewer index nodes can be stored in a page. The more IO times are required during traversal, and the worse the index performance is.

method:

a. Convert the string to digital type storage, such as IP address to integer data

MySQL provides two ways to handle IP addresses

•inet_ Aton converts IP to an unsigned integer (4-8 bits) · INET_ Ntoa translates an integer IP into an address

Use INET before inserting data_ Aton converts IP address to integer, which can save space. When displaying data, INET is used_ To a integer IP address into the address display can be.

b. For non negative data (such as auto increment ID, integer IP), the use of unsigned integer is preferred

reason:

The storage space without sign is twice as much as that with sign

SIGNED INT -2147483648~2147483647

In varchar (n), n represents the number of characters, not bytes. Utf8 is used to store 255 Chinese characters, and varchar (255) = 765 bytes.Too much length consumes more memory.

2. Avoid using text and BLOB data types. The most common text type can store 64K data

a. It is recommended to separate blob or text columns into separate extended tables

MySQL memory temporary table does not support big data types such as text and blob. If such data is included in the query, the temporary memory table cannot be used in sorting and other operations, and the disk temporary table must be used. Moreover, for this kind of data, MySQL still needs to make a second query, which will make the SQL performance very poor, but it does not mean that such data types must not be used.

If it is necessary to use it, it is recommended to separate blob or text columns into separate extended tables. When querying, you must not use select * and only need to extract the necessary columns. When you do not need the data of text column, do not query the column.

2. Only prefix indexes can be used for text or blob types

Because MySQL [1] has a limit on the length of index fields, the text type can only use prefix index, and there is no default value on the text column

3. Avoid using enum type

Alter statement is required to modify enum value

Enum type order by operation is inefficient and requires additional operations

Numeric values are prohibited as enum values for enum

4. Define all columns as not null as possible

reason:

Index null columns need extra space to hold, so they take up more space

In comparison and calculation, special treatment should be made for null value

5. Use timestamp (4 bytes) or datetime type (8 bytes) to store time

The time range of timestamp storage is 1970-01-01 00:00:01 ~ 2038-01-19-03:14:07

Timestamp takes 4 bytes, the same as int, but is more readable than int

Use datetime type storage that exceeds the value range of timestamp

People often use strings to store date type data (incorrect practice)

Disadvantage 1: unable to calculate and compare with date function. Disadvantage 2: using string to store date takes more space

6. The amount data related to finance must use decimal type

· non precision floating point: float, double

Decimal type is a precision floating-point number, and the precision will not be lost during calculation

The occupied space is determined by the defined width. Every 4 bytes can store 9 digits, and the decimal point takes one byte

Can be used to store larger integer data than bigint


Index design specification

1. Limit the number of indexes on each table. It is recommended that the number of indexes in a single table should not exceed 5

The more indexes, the better! Indexing can improve efficiency, but it can also reduce efficiency.

Index can increase query efficiency, but it will also reduce the efficiency of insertion and update, and even reduce the efficiency of query in some cases.

When the MySQL optimizer chooses how to optimize the query, it will evaluate each available index according to the unified information to generate the best execution plan. If there are many indexes available for query at the same time, it will increase the time for the MySQL optimizer to generate the execution plan, which will also reduce the query performance.

2. It is forbidden to create a separate index for each column in the table

Before version 5.6, a SQL can only use one index in a table. After 5.6, although there is an optimization method of merging indexes, it is still far from good to use a combined index query method.

3. Each InnoDB table must have a primary key

InnoDB is an index organization table: the logical order of data storage is the same as the order of index. Each table can have multiple indexes, but the storage order of tables can only be one.

InnoDB organizes tables in the order of primary key indexes

Do not use frequently updated columns as primary keys, which are not applicable to multi column primary keys (equivalent to Federated indexes) · do not use UUID, MD5, hash, string columns as primary keys (unable to guarantee the sequential growth of data) · it is recommended to use auto increment ID values for primary keys


4. Suggestions on common index columns

The columns that appear in the where clause of select, update and delete statements; the fields contained in order by, group by and distinct; do not build an index for the columns that match the fields in 1 and 2; generally, it is better to establish a joint index for the fields in 1 and 2; the associated columns of multi table join


5. How to select the order of index columns

The purpose of index establishment is to search data through index, reduce random IO and increase query performance. The less data the index can filter out, the less data will be read from disk.

The highest distinguishability is placed on the left most side of the union index (discrimination = the number of different values in the column / the total number of columns) · try to put the columns with small field length at the far left of the federated index (because the smaller the field length, the larger the amount of data that can be stored on a page, IO) The most frequently used columns are placed to the left of the federated index (this allows less indexing)


6. Avoid building redundant index and duplicate index (increase the time for query optimizer to generate execution plan)

· example of duplicate index: primary key (ID), index (ID), unique index (ID) · example of redundant index: index (a, B, c), index (a, b), index (a)


7. For frequent queries, coverage index is preferred

Overlay index: the index that contains all the query fields (where, select, order by, group by)

Benefits of index coverage:

Avoid secondary query of InnoDB table index:InnoDB is stored in the order of clustered index. For InnoDB, the primary key information of the row is saved by the secondary index in the leaf node. If the secondary index is used to query the data, after finding the corresponding key value, we need to make a secondary query through the primary key to obtain the data we really need. In the overlay index, all the data can be obtained from the key value of the secondary index, which avoids the secondary query of the primary key, reduces the IO operation and improves the query efficiency. •Random IO can be changed into sequential IO to speed up query efficiencySince the overlay index is stored in the order of key values, the IO of each row of IO intensive range lookup is much less than that of random reading of each row from disk. Therefore, the random read io of disk can be converted into the sequential io of index lookup when accessing by using the overlay index.


8. Index set specification

Try to avoid using foreign key constraints

Foreign key constraint is not recommended, but it must be indexed on the associated key between tables. Foreign key can be used to ensure the referential integrity of data, but it is recommended to implement it on the business side. Foreign key will affect the write operation of parent table and child table, thus reducing performance


Database SQL development specification

1. It is recommended to use precompiled statements for database operation

Precompiled statements can reuse these plans, reduce the time required for SQL compilation, and solve the problem of SQL injection caused by dynamic SQL.

Passing only parameters is more efficient than passing SQL statements.

The same statement can be parsed once and used many times to improve the processing efficiency.

2. Avoid implicit conversion of data type

Implicit conversion will lead to index invalidation, such as:

select name,phone from customer where id = '111';

3. Make full use of the existing indexes on the table

Avoid using double% query criteria. For example:a like '%123%'(if there is no leading% and only the postposition%, the index on the column can be used)

A SQL can only use one column in the composite index for range query. For example, if there is a union index of column a, B and C, and there is a range query of column a in the query criteria, the index on column B and column C will not be used.

When defining a federated index, if a column needs to use range lookup, it is necessary to put column a to the right of the federated index, and use left join or not exists to optimize the not in operation, because not in usually uses index invalidation.

4. In database design, we should consider the future expansion

5. Program to connect different databases, use different accounts, base cross database query

Leave room for database migration and sub database and sub table, reduce business coupling, and avoid security risks caused by excessive permissions

6. Do not use select * you must use Select < field list > to query

reason:

More CPU and IO are consumed for network bandwidth resources. Overlay index cannot be used. The impact of table structure changes can be reduced

7. Do not use insert statement without field list

For example:

insert into values ('a','b','c');

Use:

insert into t(c1,c2,c3) values ('a','b','c');

8. Avoid subquery and optimize subquery to join operation

Usually, when the subquery is in the in clause and the subquery is simple SQL (excluding Union, group by, order by and limit clauses), the subquery can be transformed into associated query for optimization.

Reasons for poor subquery performance:

The result set of subquery cannot use index. Usually, the result set of subquery is stored in temporary table. No index exists in memory temporary table or disk temporary table, so query performance will be affected. Especially for the subquery with large return result set, the greater the impact on query performance.

Because subquery will produce a large number of temporary tables and no index, it will consume too much CPU and IO resources and generate a large number of slow queries.

9. Avoid using join to associate too many tables

For MySQL, there is an associated cache. The size of the cache can be determined by the join_ buffer_ Size parameter.

In mysql, one more association cache will be allocated for the same SQL multiple join tables. If more tables are associated in a SQL, the more memory it will occupy.

If a large number of operations associated with multiple tables are used in the program, and join at the same time_ buffer_ If the size is set unreasonably, it is easy to cause server memory overflow, which will affect the stability of server database performance.

For a maximum of 61 associated tables, it is not recommended to have more than 5 associated tables at the same time.

10. Reduce the number of interaction with the database

The database is more suitable for batch operations. Merging multiple identical operations together can improve processing efficiency.

11. Use in instead of or for the same column

The value of in should not exceed 500. The in operation can make more efficient use of the index, or can rarely use the index in most cases.

12. Do not use order by rand() for random sorting

Order by rand() will load all qualified data in the table into memory, and then sort all data in memory according to the randomly generated values, and may generate a random value for each row. If the data set meeting the conditions is very large, it will consume a lot of CPU, IO and memory resources.

It is recommended to get a random value in the program and then get the data from the database.

13. Function conversion and calculation of columns are prohibited in where clause

When a column is functionally converted or evaluated, the index cannot be used

Not recommended:

where date(create_time)='20190101'

recommend:

where create_time >= '20190101' and create_time < '20190102'

14. Use union all instead of union when it is obvious that there will be no duplicate values

· Union will put all the data of the two result sets into the temporary table before de duplication operation

15. Split complex large SQL into multiple small SQL

Large SQL is logically complex and requires a large amount of CPU for calculation. In mysql, one SQL can only use one CPU for calculation. After SQL splitting, the processing efficiency can be improved by parallel execution


Code of conduct for database operation

1. Batch write (update, delete, insert) operations of more than 1 million lines should be performed in batches and multiple times

Mass operations can cause severe master-slave delays

In a master-slave environment, a large number of operations may cause serious master-slave delay. A large number of write operations generally need to be executed for a certain period of time, and only when the master database is completed can they be executed on other slave databases. Therefore, there will be a long-term delay between the master database and the slave database

When binlog log log is in row format, a large number of logs will be generated

A large number of logs will be generated by a large number of write operations, especially for binary data in row format. The more data we modify at a time, the more log volume will be generated, and the longer the log transmission and recovery time will be. This is also a reason for the delay of master-slave

Avoid large transaction operations

Mass modification of data must be carried out in a transaction, which will cause a large number of data in the table to be locked, resulting in a lot of blocking, which will have a great impact on the performance of MySQL.

In particular, long-term blocking will occupy all available connections to the database, which will make other applications in the production environment unable to connect to the database. Therefore, it is necessary to pay attention to batch write operations

2. For large tables, use Pt online schema change to modify the table structure

Avoid the master-slave delay caused by large table modification, and avoid locking the table when modifying the table fields

We must be careful to modify the data structure of large tables, which will cause serious table locking operations, especially in the production environment.

PT online schema change it will first create a new table with the same structure as the original table, and modify the table structure on the new table, and then copy the data in the original table to the new table, and add some triggers to the original table. The newly added data in the original table is also copied to the new table. After all the data in the rows are copied, the new table is named as the original table, and the original table is deleted. The original DDL operation is decomposed into several small batches.

3. It is forbidden to give super permission to the account used by the program

When the maximum number of connections is reached, a user connection with super permission will be run. The super permission can only be used by the DBA’s account dealing with the problem

4. For the program connection database account, follow the principle of minimum authority

The database account used by the program can only be used under one dB, and cross database is not allowed. In principle, the account used by the program is not allowed to have drop permission