Suggestions on MySQL high performance optimization specification

Time:2021-4-14

Database command specification

  • All database object names must be lowercase and separated by underscores
  • All database object names are prohibited to use MySQL reserved keywords (if the table name contains keywords for query, it needs to be enclosed in single quotation marks)
  • The naming of database objects should be able to see the name and meaning, and should not exceed 32 characters in the end
  • Temporary library table must be TMP_ The backup table must be prefixed with Bak and suffixed with date_ Prefix with date (time stamp) and suffix with date (time stamp)
  • All column names and column types that store the same data must be consistent (generally, as associated columns, if the associated column types are inconsistent during query, the data type will be converted implicitly automatically, which will cause the index on the column to be invalid and reduce the query efficiency)
    • *

Basic database design specification

1. All tables must use InnoDB storage engine

Without special requirements (i.e. functions that InnoDB cannot meet, such as column storage, storage space data, etc.), all tables must use InnoDB storage engine (MyISAM is used by default before MySQL 5.5, and InnoDB is used by default after MySQL 5.6).

InnoDB supports transactions, supports row level locks, has better recoverability, and has better performance in high concurrency.

2. Utf8 is used for the character set of database and table

The compatibility is better. The unified character set can avoid the garbled code caused by character set conversion. The conversion before comparing different character sets will cause index failure. If there is a need to store Emoji expressions in the database, the utf8mb4 character set should be used as the character set.

Reference article:A real case of index invalidation caused by inconsistency of MySQL character set

All tables and fields need to be annotated

Use the comment clause to add notes of tables and columns, and maintain the data dictionary from the beginning

4. Try to control the amount of data in a single table, and it is recommended to control it within 5 million.

5 million is not the limit of MySQL database. Too much will cause great problems in modifying table structure, backup and recovery.

The amount of data can be controlled by historical data archiving (applied to log data), sub database sub table (applied to business data), etc

5. Use MySQL partition table cautiously

Partition table is represented as multiple files physically and one table logically;

If you choose partition key carefully, the efficiency of cross partition query may be lower;

It is suggested to manage big data by physical sub table.

6. Try to separate hot and cold data and reduce the width of the table

MySQL limits each table to 4096 columns at most, and the size of each row of data cannot exceed 65535 bytes.

Reduce disk IO to ensure the hit rate of hot data memory cache (the wider the table is, the larger the memory occupied when loading the table into the memory buffer pool will be, and more IO will be consumed);

More effective use of cache, avoid reading useless cold data;

Frequently used columns are placed in one table (to avoid more associative operations).

7. It is forbidden to create reserved fields in the table

It is difficult to recognize the meaning of the reserved field.

The reserved field cannot confirm the type of data stored, so the appropriate type cannot be selected.

Modifying the reserved field type will lock the table.

8. It is forbidden to store big binary data such as pictures and files in the database

Usually, the file is large, which will cause the rapid growth of data in a short time. When the database reads the database, a large number of random IO operations are usually carried out. When the file is large, IO operations are very time-consuming.

Usually stored in file server, database only stores file address information

9. It is forbidden to do database stress test online

10. It is forbidden to directly connect to the production environment database from the development environment and test environment


Database field design specification

1. The smallest data type that meets the storage needs is preferred

reason:

The larger the field of a column, the larger the space needed to build an index. In this way, the fewer the number of index nodes that can be stored in a page, the more IO times needed to traverse, and the worse the performance of the index.

method:

a. The string will be converted to digital type storage, such as: the IP address into plastic data

MySQL provides two ways to handle IP addresses

  • inet_ Aton converts IP to unsigned integer (4-8 bits)
  • inet_ Ntoa converts integer IP to address

Use INET before inserting data_ Aton converts IP address to integer, which can save space. When displaying data, INET is used_ Ntoa can convert the integer IP address to the address display.

b. For non negative data (such as self incrementing ID, integer IP), it is preferred to use the unsigned integer to store

reason:

The storage space of no sign is twice as much as that of sign

SIGNED INT -2147483648~2147483647
UNSIGNED INT 0~4294967295

The N in varchar (n) represents the number of characters, not the number of bytes. Utf8 is used to store 255 Chinese characters, varchar (255) = 765 bytes.Excessive length consumes more memory.

2. Avoid using text and BLOB data types. The most common text type can store 64K data

a. It is recommended that the blob or text columns be separated into separate extended tables

MySQL memory temporary table does not support big data types such as text and blob. If the query contains such data, the memory temporary table cannot be used for sorting and other operations, and the disk temporary table must be used for sorting. Moreover, for this kind of data, MySQL still needs to conduct a second query, which will make the performance of SQL very poor, but it does not mean that such data type must not be used.

If it must be used, it is recommended to separate the blob or text column into a separate extended table. When querying, you must not use select * but only take out the necessary column. When you do not need the data of the text column, do not query the column.

2. Text or blob types can only use prefix indexes

becauseMySQLThere is a limit on the length of index field, so text type can only use prefix index, and text column cannot have default value

3. Avoid enum type

Alter statement is required to modify enum value

Order by operation of enum type is inefficient and requires additional operation

The use of numeric values as enum values is prohibited

4. Define all columns as not null as far as possible

reason:

The index null column needs extra space to hold, so it takes up more space

The null value should be treated specially when comparing and calculating

5. Use timestamp (4 bytes) or datetime type (8 bytes) to store time

The time range of timestamp is from 1970-01-01 00:00:01 to 2038-01-19-03:14:07

Timestamp takes 4 bytes, which is the same as int, but is more readable than int

Use datetime type storage when the value of timestamp is out of range

People often use strings to store date data

  • Disadvantage 1: unable to calculate and compare with date function
  • Disadvantage 2: it takes more space to store dates in strings

6. The amount data related to finance must use the decimal type

  • Imprecise floating point: float, double
  • Precise floating point: decimal

The decimal type is a precise floating-point number, so precision will not be lost in calculation

The occupied space is determined by the defined width. Every 4 bytes can store 9 digits, and the decimal point takes up one byte

It can be used to store integer data larger than bigint


Index design specification

1. Limit the number of indexes on each table. It is recommended that the number of indexes on a single table should not exceed 5

The more indexes, the better! Indexing can improve efficiency, but it can also reduce efficiency.

Index can increase the efficiency of query, but it will also reduce the efficiency of insertion and update, and even reduce the efficiency of query in some cases.

When MySQL optimizer chooses how to optimize the query, it will evaluate every index that can be used according to the unified information to generate the best execution plan. If there are many indexes that can be used for query at the same time, it will increase the time for MySQL optimizer to generate the execution plan and reduce the query performance.

2. It is forbidden to create a separate index for each column in the table

Before version 5.6, a SQL can only use one index in a table. After 5.6, although there is an optimized way to merge indexes, it is still far from a good way to query by using a joint index.

3. Each InnoDB table must have a primary key

InnoDB is an index organization table: the logical order of data storage is the same as that of index. Each table can have multiple indexes, but the storage order of the table can only be one.

InnoDB organizes tables in the order of primary key index

  • Do not use frequently updated columns as primary keys. Multi column primary keys (equivalent to joint indexes) are not applicable
  • Don’t use UUID, MD5, hash, character string column as primary key (unable to guarantee the order growth of data)
  • Auto increment ID value is recommended for primary key
    • *

4. Common index column suggestions

  • Columns that appear in where clauses of select, update, and delete statements
  • Fields contained in order by, group by, distinct
  • Do not set up an index for all the columns that match the fields in 1 and 2. Generally, it is better to set up a joint index for the fields in 1 and 2
  • Associated columns of multi table join
    • *

5. How to select the order of index columns

The purpose of index building is to search data through index, reduce random IO and increase query performance. The less data can be filtered out by index, the less data can be read from disk.

  • The one with the highest discrimination is placed on the leftmost side of the union index (discrimination = number of different values in the column / total number of columns)
  • Try to put the columns with small field length on the leftmost side of the union index (because the smaller the field length is, the more data can be stored in a page, and the better the IO performance will be)
  • The most frequently used columns are placed on the left side of the union index (this can create fewer indexes)
    • *

6. Avoid creating redundant index and duplicate index (increase the time for query optimizer to generate execution plan)

  • Examples of duplicate index: primary key (ID), index (ID), unique index (ID)
  • Examples of redundant indexes: index (a, B, c), index (a, b), index (a)
    • *

7. For frequent queries, overlay index is preferred

Overlay index: the index that contains all query fields (where, select, order by, group by)

Benefits of overlay index:

  • Avoid the second query of InnoDB table indexInnoDB is stored in the order of clustered index. For InnoDB, the secondary index stores the primary key information of the row in the leaf node. If the secondary index is used to query the data, after finding the corresponding key value, we need to make a secondary query through the primary key to get the data we really need. In the overlay index, all data can be obtained from the key value of the secondary index, which avoids the secondary query of the primary key, reduces IO operation and improves the query efficiency.
  • The random IO can be changed into sequential IO to speed up the query efficiencyBecause the overlay index is stored in the order of key values, for IO intensive range lookup, the IO of each row read randomly from the disk is much less than that read randomly from the disk. Therefore, the overlay index can also convert the IO read randomly from the disk into the order of index lookup.
    • *

8. Index set specification

Try to avoid using foreign key constraints

  • It is not recommended to use foreign key constraint, but it is necessary to build an index on the associated key between tables
  • Foreign keys can be used to ensure the referential integrity of data, but it is recommended to implement them on the business side
  • Foreign keys can affect the write operations of parent and child tables, thus reducing performance
    • *

Database SQL development specification

1. It is recommended to use precompiled statements for database operation

Precompiled statements can reuse these plans, reduce the time required for SQL compilation, and solve the problem of SQL injection caused by dynamic SQL.

Passing only parameters is more efficient than passing SQL statements.

The same statement can be parsed once and used many times to improve processing efficiency.

2. Avoid implicit conversion of data types

Implicit conversion can lead to index invalidation

select name,phone from customer where id = '111';

3. Make full use of the existing indexes on the table

Avoid using double% query criteria. For example:a like '%123%'(if there is no leading% and only the trailing% can be used for the index on the column)

A SQL can only use one column in the composite index for range query. For example, if there is a joint index with columns a, B and C, and there is a range query with column a in the query condition, the index on columns B and C will not be used.

When defining a union index, if column a needs to use range lookup, it is necessary to put column a on the right side of the union index, and use left join or not exists to optimize the not in operation, because not in usually uses index invalidation.

4. When designing the database, we should consider the future expansion

5. The program connects to different databases and uses different accounts. Cross database query is forbidden

  • Make room for database migration and sub database sub table
  • Reduce business coupling
  • Avoid the security risk caused by too much authority

6. Select * is forbidden. Select < field list > query must be used

reason:

  • Consume more CPU and IO to save network bandwidth resources
  • Cannot use overlay index
  • It can reduce the impact of table structure changes

7. It is forbidden to use insert statement without field list

For example:

insert into values ('a','b','c');

Should use:

insert into t(c1,c2,c3) values ('a','b','c');

8. Avoid using subquery and optimize subquery to join operation

Usually, only when the subquery is in the in clause and the subquery is simple SQL (excluding Union, group by, order by and limit clauses), can the subquery be transformed into an associated query for optimization.

Reasons for poor performance of subqueries:

The result set of a subquery cannot be indexed. Usually, the result set of a subquery is stored in a temporary table, and there is no index in either the memory temporary table or the disk temporary table, so the query performance will be affected to some extent. Especially for the subqueries with large return result set, the greater the impact on query performance.

Because the subquery will produce a large number of temporary tables and no index, it will consume too much CPU and IO resources and produce a large number of slow queries.

9. Avoid using join to associate too many tables

For MySQL, there is an associated cache, and the size of the cache can be determined by the join_ buffer_ Size parameter.

In mysql, for the same SQL multi Association (join) table, one more association cache will be allocated. If more tables are associated in a SQL, the more memory will be occupied.

If a large number of operations of multi table association are used in the program, join at the same time_ buffer_ If the size is set unreasonably, it is easy to cause server memory overflow, which will affect the stability of server database performance.

At the same time, for the association operation, it will produce temporary table operation, which will affect the query efficiency. MySQL allows a maximum of 61 tables to be associated, and it is recommended that no more than 5 tables be associated.

10. Reduce the number of interactions with the database

Database is more suitable to deal with batch operations. Merging multiple identical operations together can improve processing efficiency.

11. When judging or for the same column, use in instead of or

The value of in should not exceed 500. In operation can make more effective use of index, or in most cases, index is rarely used.

12. It is forbidden to use order by rand() for random sorting

Order by rand() will load all the qualified data in the table into the memory, and then sort all the data in the memory according to the randomly generated values, and may generate a random value for each row. If the data set meeting the conditions is very large, it will consume a lot of CPU, IO and memory resources.

It is recommended to get a random value in the program and then get the data from the database.

13. Function conversion and calculation of columns are prohibited in where clause

When a column is functionally transformed or evaluated, the index cannot be used

Not recommended:

where date(create_time)='20190101'

recommend:

where create_time >= '20190101' and create_time < '20190102'

14. Use union all instead of union when there are no duplicate values

  • Union will put all the data of the two result sets into the temporary table before de duplication
  • Union all will no longer de duplicate the result set

15. Split complex large SQL into multiple small SQL

  • Large SQL is a kind of SQL which is logically complex and needs a lot of CPU
  • In mysql, a SQL can only use one CPU for calculation
  • After SQL splitting, the processing efficiency can be improved by parallel execution
    • *

Code of conduct for database operation

1. Batch write (update, delete, insert) operations with more than 1 million lines should be performed in batches and many times

Massive operations can cause severe master-slave delays

In the master-slave environment, a large number of operations may cause serious master-slave delay. Generally, a large number of write operations need to be executed for a long time. Only when the execution on the master database is completed, it will be executed on other slave databases. Therefore, it will cause a long delay between the master database and the slave database

When binlog log is in row format, a large number of logs will be generated

A large number of logs will be generated by a large number of write operations, especially for the binary data in row format. Because the modification of each row of data will be recorded in row format, the more data we modify at a time, the more logs will be generated, and the longer the time required for log transmission and recovery, which is also a reason for the delay between master and slave

Avoid big transaction operations

Mass modification of data must be carried out in a transaction, which will cause a large number of data in the table to be locked, resulting in a lot of blocking, which will have a great impact on the performance of MySQL.

In particular, long-term blocking will occupy all available connections of the database, which will make other applications in the production environment unable to connect to the database, so we must pay attention to batch write operations

2. For large tables, use Pt online schema change to modify the table structure

  • Avoid the master-slave delay caused by large table modification
  • Avoid locking tables when modifying table fields

We must be careful to modify the data structure of large tables, which will cause serious table locking operation, especially in the production environment.

PT online schema change will first create a new table with the same structure as the original table, and modify the table structure on the new table, then copy the data in the original table to the new table, and add some triggers in the original table. The new data in the original table is also copied to the new table. After all the data in the row is copied, the new table is named as the original table, and the original table is deleted. The original DDL operation is decomposed into several small batches.

3. It is forbidden to grant super permission to the account used by the program

  • When the maximum number of connections is reached, a user connection with super permission is also run
  • Super permission can only be left to the DBA account that deals with the problem

4. For the program to connect to the database account, follow the principle of minimum authority

  • The database account used by the program can only be used under one dB, and cross database is not allowed
  • The account used by the program is not allowed to have drop permission in principle