MySQL design specifications for those common errors

Time:2021-9-4

Relying on the development of the Internet, we can use some fragmented time waiting for the bus or taking the subway to learn and understand information anytime and anywhere. At the same time, the developed Internet also facilitates people to quickly share their knowledge and discuss with friends with the same hobbies and needs.

However, too convenient sharing also makes knowledge diverse, and it is easy for people to receive wrong information. Most of these errors are caused by the rapid development of technology and no free time to update the published content in time. In order to avoid misunderstanding to those who will learn later, let’s take a look at several common error examples in MySQL design specification today.

Design of primary key

Wrong design specification:It is recommended to use self incrementing ID value for primary key instead of UUID, MD5, hash and string

This design specification can be seen in many articles. The advantages of self increasing primary key are small footprint, orderly, simple to use and so on.

Let’s take a look at the disadvantages of self incrementing primary keys:

  • Since self increment occurs on the server side, it needs to be protected by a self increment AI lock. If there are a large number of insertion requests at this time, there may be a performance bottleneck caused by self increment, so there is a concurrent performance problem;
  • Self increment is used as the primary key, which can only be unique in the current instance,Global uniqueness cannot be guaranteed, which makes it impossible to use in distributed architecture;
  • Public data values can easily lead to security problems. If our commodity ID is a self incremented primary key, users can obtain commodities by modifying the ID value. In serious cases, they can know how many commodities are stored in our database.
  • Possible performance problems caused by Mgr (MySQL group replication);

Since self increment is a value generated on the MySQL server, it needs to be protected by a self increment AI lock. If there are a large number of insert requests at this time, there may be a performance bottleneck caused by self increment. For example, in MySQL database, the parameter InnoDB_ autoinc_ lock_ Mode is used to control the holding time of the self incrementing lock. Although, we can adjust the parameter InnoDB_ autoinc_ lock_ Mode achieves the maximum performance of self increasing, but it also has other problems. So,In concurrent scenarios, UUID is more recommended as primary key or business custom generated primary key.

We can directly use the uuid() function in mysq l to obtain the value of UUID.

MySQL> select UUID();
+--------------------------------------+
| UUID()                               |
+--------------------------------------+
| 23ebaa88-ce89-11eb-b431-0242ac110002 |
+--------------------------------------+
1 row in set (0.00 sec)

It should be noted that when storing time, UUID is stored in reverse order according to time bits,That is, the low time bit is stored in the front, and the high time bit is in the last, that is, the first four bytes of UUID will change “randomly” with time, not monotonically. Rather than random values, discrete IO will be generated during insertion, resulting in performance bottlenecks. This is also the biggest disadvantage of UUID compared with self increment.

In order to solve this problem, MySQL 8.0 introduces the function UUID_ TO_ Bin, which can put the UUID string:

  • The problem of out of order when UUID is inserted is solved by putting the high time bit in the front of the parameter;
  • Remove the useless string “-” and simplify the storage space;
  • Convert the string into binary value storage, and finally shorten the storage space from the previous 36 bytes to 16 bytes.

Next, we pass the previous UUID string 23ebaa88-ce89-11eb-b431-0242ac110002 through the function UUID_ TO_ Bin is converted to obtain binary values as follows:

MySQL> SELECT UUID_TO_BIN('23ebaa88-ce89-11eb-b431-0242ac110002',TRUE) as UUID_BIN;
+------------------------------------+
| UUID_BIN                           |
+------------------------------------+
| 0x11EBCE8923EBAA88B4310242AC110002 |
+------------------------------------+
1 row in set (0.01 sec)

In addition, MySQL 8.0 also provides the bin function_ TO_ UUID, which supports reversing binary values to UUID strings.

Although there was no function UUID before MySQL 8.0_ TO_ BIN/BIN_ TO_ UUID can also be solved through custom functions. In the application layer, you can write corresponding functions according to your own programming language.

Of course, many students are also worried about the performance of UUIDs and the space occupied by storage. Here, I also conducted relevant insertion performance tests. The results are shown in the table below:

MySQL design specifications for those common errors

It can be seen that the sorting UUID provided by MySQL 8.0 has the best performance, even better than the self incrementing ID. In addition, due to UUID_ TO_ The result of bin conversion is 16 bytes, which is only 8 bytes more than the self incrementing ID. finally, the storage space is only 3G more than the self incrementing ID.

Moreover, because UUID can ensure global uniqueness, the benefit of using UUID is much greater than self increasing ID. You may be used to using self increment as the primary key, but in concurrent scenarios, it is more recommended to use globally unique values such as UUID as the primary key.

Of course, UUID is good, but in the distributed scenario, some additional information needs to be added to the primary key to ensure the query efficiency of subsequent secondary indexes. It is recommended to generate the primary key according to business customization. However, when the concurrency and data volume are not so large, it is recommended to use self increasing UUID. Let’s not think that UUID can’t be the primary key.

Design of financial field

Wrong design specification:Amount data related to finance must use decimal typeBoth float and double are imprecise floating-point types, while decimal is an exact floating-point type. Therefore, in the design of user balance, commodity price and other financial fields, the decimal type is generally used, which can be accurate to points.

However, in the design standards for massive Internet services, it is not recommended to use decimal type, but to convert decimal into integer type.In other words, financial types prefer to use sub unit storage rather than meta unit storage. For example, 1 yuan is stored in the database with integer type 100.

Here are the advantages of the bigint type:

  • Decimal is a coding method implemented by binary, and its computational efficiency is not as high as bigint
  • If bigint is used, the field is a fixed length field with high storage efficiency, while decimal is determined according to the defined width. In data design, the fixed length storage performance is better
  • Bigint can be used to store the amount in units or in gigabytes, which is enough

Use of enumeration fields

Wrong design specification:Avoid enum types

In previous development projects, when you encounter fields such as user gender, whether goods are on the shelf, and whether comments are hidden, you simply design the field as tinyint, and then note 0 why status and 1 why status in the field.

The problem of this design is also obvious:

  • Unclear expression: this table may be designed by other colleagues. If you are not particularly impressed, you need to look at the field comments every time. Sometimes, you even need to go to the database to confirm the meaning of the fields during coding
  • Dirty data: Although the inserted values can be limited by code in the application layer, the values can still be modified by SQL and visualization tools

For fields with fixed option values, enum enumeration string type plus SQL is recommended_ Strict mode of mode

In MySQL versions after 8.0.16, you can directly use the check constraint mechanism without enum enumerating field types

Moreover, we generally use “Y”, “n” and other single characters when defining enumeration values, which does not take up a lot of space. However, if the option value is not fixed, it may increase with the development of business, so the enumeration field is not recommended.

Index number limit

Wrong design specification:Limit the number of indexes on each table. A table cannot have more than 5 indexes

There is no limit on the number of indexes in a MySQL single table. If there are specific needs for business queries, you can create them. Do not blindly believe in the number limit

Use of subqueries

Wrong design specification:Avoid subqueries

In fact, this specification is correct for the old version of MySQL. Because the optimization of sub queries in the previous version of MySQL database is limited, we require online services not to use sub queries as much as possible in many OLTP business occasions.

However, in MySQL 8.0, the optimization of subqueries has been greatly improved, so you can safely use subqueries in the new version of MySQL.

Subquery is easier to understand than join. For example, we want to check the number of students who have not posted an article in 2020

SELECT COUNT(*)
FROM user
WHERE id not in (
    SELECT user_id
    from blog
    where publish_time >= "2020-01-01" AND  publish_time <= "2020-12-31"
)

You can see that the logic of the sub query is very clear: which users query the article table through not in.

If left join is used

SELECT count(*)
FROM user LEFT JOIN blog
ON user.id = blog.user_id and blog.publish_time >= "2020-01-01" and blog.publish_time <= "2020-12-31"
where blog.user_id is NULL;

It can be found that although left join can also meet the above requirements, it is not easy to understand.

We use explain to view the execution plans of the two SQL statements and find that they are the same

MySQL design specifications for those common errors

It is obvious from the figure above that both subqueries and left joins are finally converted into left hash joins, so the execution time of the above two SQL statements is the same. That is, in MySQL 8.0, the optimizer will automatically optimize the in subquery to the best join execution plan, which will significantly improve the performance.

summary

After reading the previous content, I believe you have a new understanding of MySQL. These common mistakes can be summarized as follows:

  • UUIDs can also be used as primary keys. Self incrementing UUIDs have better performance than self incrementing primary keys, and the extra space occupied can be ignored
  • In addition to decimal, you can also try bigint to store data divided into units
  • For fields with fixed option values, MySQL 8 used to recommend enumerating fields. In the future, MySQL 8 will use the check function constraint instead of 0, 1 and 2
  • There is no limit to the number of indexes in a table, which cannot exceed 5. You can add and delete indexes according to business conditions
  • MySQL 8 optimizes sub queries and can be used safely.

Recommended reading

Practical notes: the journey of configuring monitoring services for NSQ

Go Zero: out of the box microservice framework