MySQL design and development specification

Time:2021-9-27

This specification aims to help or guide Rd, QA, OP and other technicians to make database design suitable for online business. Standardize the database change and processing flow, database table design, SQL writing, etc., so as to provide guarantee for the stable and healthy operation of the company’s business system

design code

All the following specifications will be marked according to the three levels of [high risk], [mandatory] and [recommendation], and the compliance priority will be from high to low

DBA has the right to forcibly call back and require modification for designs that do not meet the two levels of [high risk] and [mandatory]

Library name

1. [mandatory] the name of the library must be controlled within 32 characters. The relationship between the table names of related modules and the table names should reflect the join as much as possible, such as the user table and the user table_ Login table

2. Name format of [mandatory] Library: business system name_ For the first mock exam, use the same name as the subsystem name.

3. [mandatory] the naming format of general sub database name is the database configuration name_ Number. The number increases from 0, such as wenda_ 001 the name format of time based sub database is “database configuration name time”

4. [mandatory] when creating a database, the character set must be explicitly specified, and the character set can only be utf8 or utf8mb4. Example of creating SQL database:create database db1 default character set utf8;

Table structure

1. The [mandatory] table must have a primary key, and the ID is set to self incrementing primary key

2. [mandatory] tables are forbidden to use foreign keys. If integrity is to be guaranteed, it should be implemented by the program side. Foreign keys couple tables and affect update, delete and other performance, which may cause deadlock. In high concurrency environment, it is easy to cause database performance bottleneck

3. [mandatory] the names of tables and columns must be controlled within 32 characters. Table names can only use letters, numbers and underscores, all in lowercase. If the table name is too long, abbreviations can be used

4. [mandatory] when creating a table, the character set must be explicitly specified as utf8 or utf8mb4

5. [mandatory] when creating a table, the table storage engine type must be explicitly specified. If there are no special requirements, it must be InnoDB. When a storage engine other than InnoDB / MyISAM / memory needs to be used, it must pass the DBA audit before it can be used in the production environment. Because InnoDB table supports transaction, row lock, downtime recovery, mvcc and other important features of relational databases, it is the MySQL storage engine most used in the industry. This is not available in most other storage engines, so InnoDB is the first choice

6. [mandatory] create a table must have a comment, and both table level and field level must have a comment

7. [suggestion] when creating a table, about the primary key: (1) the primary key is required to be ID, type int or bigint (for future extensibility, the new table is required to be bigint) and auto_ Increment (2) the field identifying the entity of each row in the table should not be set as the primary key. It is recommended to set it as other fields, such as user_ id,order_ ID, etc., and establish a unique key index. If it is set as the primary key and the primary key value is randomly inserted, it will lead to page splitting and a large number of random I / O in InnoDB, and the performance will be degraded

8. [suggestion] core tables (such as user tables and money related tables) must have the creation time field of row data_ Time and last update time fields update_ Time, easy to check problems

9. All fields in the [suggestion] table must be not null default default attribute. The business can define the default value as needed. Because using null value will cause problems such as each row will occupy additional storage space, data migration is prone to errors, deviation of aggregate function calculation results, index invalidation and so on

10. [suggestion] it is recommended to vertically split the blob, text and other large fields in the table into other tables, and select only when you need to read these objects

11. [suggestion] anti normal form design: redundant fields that often need join query in other tables. Such as user_ Name attribute in user_ account,user_ login_ Log and other tables are redundant to reduce join queries

12. [mandatory] the intermediate table is used to retain the intermediate result set. The name must be TMP_ start. The backup table is used to back up or grab a snapshot of the source table. The name must be bak_ start. Intermediate tables and backup tables are cleaned regularly

13. [mandatory] online implementation of DDL changes must be reviewed by DBA and implemented by DBA in the low peak period of business

Column data type optimization

1. For the auto increment column (auto_increment attribute) in the [suggestions] table, it is recommended to use bigint type. Because the storage range of unsigned int is – 2147483648 ~ 2147483647 (about 2.1 billion), an error will be reported after overflow

2. [suggestion] it is recommended to use tinytint or smallint for the status, type and other fields with few selectivity in the business to save empty storage

3. [suggestion] the IP address field in the service is recommended to use int type, not char (15). Because int only occupies 4 bytes, it can be converted to each other with the following function, while char (15) occupies at least 15 bytes. Once the number of table data rows reaches 100 million, 1.1g more storage space should be used. SQL:select inet_ aton(‘192.168.2.12’); select inet_ ntoa(3232236044); PHP: ip2long(‘192.168.2.12’); long2ip(3530427185);

4. [suggestion] enum and set are not recommended. Because they waste space and the enumeration value is written dead, it is inconvenient to change. Tinyint or smallint is recommended

5. [suggestion] blob, text and other types are not recommended. They all waste hard disk and memory space. When loading table data, large fields will be read into memory, which wastes memory space and affects system performance. It is suggested to communicate with PM and Rd. do you really need such a large field

6. [suggestion] for the field storing money, it is recommended to use int, multiply the program end by 100 and divide by 100 for access. Or use the decimal type instead of double

7. [suggestion] text data shall be stored in varchar as far as possible. Because varchar is variable length storage, it saves more space than char. The MySQL server layer specifies that all text in a line can be stored up to 65535 bytes

8. [suggestion] try to select datetime as the time type. Although timestamp takes up less space, it has the problem that the time range is 1970-01-01 00:00:01 to 2038-01-01 00:00:00

Index design

1. [mandatory] the primary key of InnoDB table must be id int / bigint auto_ Increment, and the primary key value cannot be updated

2. [suggestion] the unique key starts with “uk_” or “uq_”, and the ordinary index starts with “idx_”, all in lowercase format, with the name or abbreviation of the field as the suffix

3. [mandatory] InnoDB and MyISAM storage engine tables. The index type must be BTREE; The memory table can select hash or BTREE type indexes as needed

4. [mandatory] the length of each index record in a single index cannot exceed 64KB

5. [suggestion] the number of indexes on a single table cannot exceed 5

6. [suggestion] when establishing an index, consider establishing a joint index, and put the field with the highest discrimination in the front. For example, the discrimination of column userid can be calculated by select count (distinct userid)

7. [suggestion] in the SQL of multi table join, ensure that there is an index on the connection column of the driven table, so that the execution efficiency of the join is the highest

8. [suggestion] when creating tables or adding indexes, ensure that there are no redundant indexes in the tables. For MySQL, if the key (a, b) already exists in the table, the key (a) is a redundant index and needs to be deleted

Sub database, sub table and sub table

1. [mandatory] the partition key of the partition table must have an index or the first column of the combined index

2. [mandatory] the number of partitions (including sub partitions) in a single partition table cannot exceed 1024

3. [mandatory] before going online, RD or DBA must specify the creation and cleaning strategy of partition table

4. [mandatory] SQL accessing partition table must contain partition key

5. [suggestion] the file size of a single partition shall not exceed 2G, and the total size shall not exceed 50g. It is recommended that the total number of partitions should not exceed 20

6. [mandatory] the alter table operation for the partition table must be performed during the low peak period

7. [mandatory] if the sub database strategy is adopted, the number of databases cannot exceed 1024

8. [mandatory] if the split table strategy is adopted, the number of tables cannot exceed 4096

9. [suggestion] it is recommended that a single sub table should not exceed 500W rows, so as to ensure better data query performance

10. [suggestion] the horizontal table should be divided into modules as much as possible, and enough buffers should be reserved to avoid re splitting and migration in the future. It is recommended that the log and report data be divided into tables by date

character set

1. [mandatory] all character sets of database, table and column must be consistent, which are utf8 or utf8mb4

2. [mandatory] the character set in the front-end program character set or environment variable must be consistent with the character set of the database and table, and unified as utf8

SQL writing

DML statement

1. The [forced] select statement must specify a specific field name and cannot be written as。 Because selectData that should not be read will also be read from mysql, causing network card pressure. And once the table field is updated, but the program side does not have time to update, the system will report an error

2. The [mandatory] insert statement specifies the specific field name. Do not write it as insert into T1 values (…). The reason is the same as above

3. [suggestion] insert into… Values (XX), (XX), (XX). The value of XX here should not exceed 500. Too many values will cause delay in master-slave synchronization, although they go online very quickly

4. [recommended] don’t use union in the select statement. It is recommended to use union all, and the number of union clauses is limited to 3. Because union all does not need to be de duplicated, it saves database resources and improves performance

5. [recommended] the list of in values is limited to 500. For example, select… Where userid in (… Within 500…), which is to reduce the underlying scanning and reduce the pressure on the database, so as to speed up the query

6. [suggestion] in the transaction, the quantity of batch updating data needs to be controlled, and necessary sleep should be carried out to achieve a small number of times

7. All tables involved in the [mandatory] transaction must be InnoDB tables. Otherwise, once it fails, it will not be rolled back, and it is easy to interrupt the synchronization of master-slave libraries

8. [forced] write and transaction are sent to the master library, and read-only SQL is sent to the slave library, that is, the program side realizes read-write separation

9. [mandatory] the DML statement must have a where condition and use the index to search

10. [mandatory] hint, such as SQL, is prohibited in the production environment_ no_ Cache, force index, ignore key, straight join, etc. Because hint is used to force SQL to execute according to an execution plan, but as the amount of data changes, we can’t guarantee that our original prediction is correct. We should try our best to let the MySQL optimizer choose its own execution plan

11. [mandatory] the field types around the equal sign in the where condition must be consistent, otherwise the index cannot be used

12. [suggestion] select | update | delete | replace must have a where clause, and the conditions of the where clause must be searched by index

13. [mandatory] full table scanning on large tables is strongly not recommended in the production database, but it can be used for static tables with less than 100 rows. The amount of query data should not exceed 25% of the number of table rows, otherwise the index will not be used

14. [mandatory] in the where clause, it is forbidden to use only the fully fuzzy like condition for searching. If you want to use like, please use the way of like ‘XXXX%’. There must be other equivalent or range query conditions, otherwise the index cannot be used

15. [suggestion] do not use functions or expressions for index columns, otherwise the index cannot be used. For example, where length (name) =’admin ‘or where user_ id+2=10023

16. [suggestion] reduce the use of or statements. You can optimize or statements to union, and then establish indexes on various where conditions. For example, where a = 1 or B = 2 is optimized as where a = 1… Union… Where b = 2, key (a), key (b)

17. [suggestion] for paged query, when the starting point of limit is high, you can filter by filter criteria first. For example, select a, B, C from T1 limit 10000,20; The optimization is: select a, B, C from T1 where id > 10000 limit 20;

Multi table connection

1. [mandatory] prohibit cross DB join statements. This can reduce the coupling between modules and lay a solid foundation for database splitting

2. [mandatory] prohibit the use of join in business update SQL statements, such as update T1 join T2

3. [suggestion] it is not recommended to use sub query. It is recommended to split the sub query SQL and combine the program for multiple queries, or use join instead of sub query

4. [suggestion] in online environment, multi table join should not exceed 3 tables

5. [suggestion] alias is recommended for multi table connection query, and alias should be used to reference fields in the select list, database. Table format, such as select a from db1.table1 alias1 where

6. [suggestion] in multi table join, try to select the table with smaller result set as the driving table to join other tables

affair

1. [suggestion] in the transaction, the number of rows operated by the insert | update | delete | replace statement should be controlled within 1000, and the number of parameters passed in the in list in the where clause should be controlled within 500

2. [suggestion] when batch operating data, it is necessary to control the transaction interval and perform necessary sleep. Generally, the recommended value is 1-2 seconds

3. [suggestion] for Auto_ The concurrency of the table insertion operation of the increment attribute field needs to be controlled within 200

4. [mandatory] the program design must consider the impact of “database transaction isolation level”, including dirty reading, non repeatable reading and phantom reading. It is recommended that the isolation level of online transactions be repeatable read

5. [suggestion] the transaction contains no more than 5 SQL (except payment business). Because too long transactions will lead to avalanche problems such as long lock data, excessive MySQL internal cache and connection consumption

6. [suggestion] the update statement in the transaction should be based on the primary key or unique key as far as possible, such as update… Where id = XX; Otherwise, clearance lock will be generated, and the internal locking range will be expanded, resulting in system performance degradation and deadlock

7. [suggestion] try to move some typical external calls out of the transaction, such as calling WebService and accessing file storage, so as to avoid too long transaction

8. [suggestion] for the select statement that is strictly sensitive to MySQL master-slave delay, please enable the transaction to force access to the master database

Sorting and grouping

1. [suggestion] reduce the use of order by, communicate with the business, do not sort without sorting, or put the sorting on the program side. The statements order by, group by and distinct consume CPU, and the CPU resources of the database are extremely valuable

2. [suggestion] order by, group by and distinct SQL try to use the index to directly retrieve the sorted data. For example, where a = 1 order by, you can use key (a, b)

3. [suggestion] contains the query statements of order by, group by and distinct. Please keep the result set filtered by the where condition within 1000 lines, otherwise the SQL will be very slow

SQL statements prohibited Online

1. [high risk] disable update delete T1… Where a = XX limit XX; This update statement with limit. If it is a binlog format other than row format, the master and slave will be inconsistent and the data will be disordered. It is recommended to add order by PK

2. [high risk] it is forbidden to use associated sub queries, such as update T1 set… Where name in (select name from user where…); Extremely inefficient

3. [force] disable procedure, function, trigger, views, event and foreign key constraints. Because they consume database resources and reduce the scalability of database instances. All recommendations are implemented on the program side

4. [suggestion] disable insert into… On duplicate key update… And replace into, which can easily lead to deadlock in high concurrency environment

5. [mandatory] prohibit associated table update statements, such as update T1, T2 where T1. Id = T2. Id