Before the official start, Cai Cai should emphasize that whether your data table should be divided needs to consider many factors, such as whether the amount of business data has reached the order of magnitude that must be divided, and whether there are other solutions to solve the current problems? I have seen more than once that some leaders blindly split tables without considering the comprehensive situation. As a result, everyone works overtime for several weeks. Don’t you lose your hair? There are also architects who split tables at the beginning of a small business. In order to cooperate with you, they also work overtime to catch up with the progress. After they go online, they find that the amount of business data is very small, but the code is limited by the table splitting strategy. The problems caused by table splitting are sometimes costly in a specific scenario.
The main problems of database table splitting are storage and performance. When the amount of data in a single table reaches a certain level, the performance of MySQL will drop sharply. Compared with SQL server and Oracle, MySQL is still in a weak position in some aspects, but table splitting is suitable for almost all relational databases.
Database table splitting should not be too blind
Sub table strategy
Table splitting is similar to database splitting, but the rules of splitting are also different. The following splitting rules are for splitting a table.
Horizontal segmentation is the most commonly used segmentation method in many businesses. Its essence is to disperse the data rows in a table into multiple tables according to the rules, such as the most common ones according to the ID range and the hash value of the business primary key. As for the order of magnitude of table data, it is related to the data format stored in the table. For example, an int field with only a few columns in a table must have a higher storage limit than a table with several columns of text type. Let’s say the limit is 10 million. But as a system leader or architect, when the table data level reaches tens of millions, we should pay attention to it, because it is a hidden danger of system performance bottleneck.
Compared with the horizontal segmentation of data tables, in the scenario of business optimization, I prefer to do table partition, and allocate different partitions to different physical disks according to the rules. In this way, the SQL statements in the business can hardly be changed. A SQL Server database of our company, after the table partition of a certain business, has reached billions of data, but the query and insertion speed can still meet the needs of the business (optimizing a system still requires energy to optimize the business level).
When it comes to vertical splitting, tables can also be split according to business. For example, if a database has user information, it can be divided into basic information and extended information according to business. If it is beneficial to business, it can be completely split into basic information table and extended information table. Of course, it can also be split according to other rules, such as splitting frequently accessed information into a table and other infrequent information into a table. The specific splitting rules still depend on the problems to be solved at that time. Vertical splitting may introduce some complexity. For example, the original query of a user’s basic information and extended information can get the results at one time. After splitting the table, you need to join or query twice to get the results.
- After the data table is vertically segmented, the original query may become a join query with linked tables, which will cause performance loss to a certain extent.
- Horizontal segmentation of data table needs certain rules. There are two common rules: range segmentation and hash value segmentation. Range segmentation refers to the segmentation according to the range of a field. For example, the user table is segmented according to the user ID. the user table with ID from 1 to 100000 is located in user table 1, and the user table with ID from 100001 to 200000 is located in user 2. The advantage of this segmentation is that it can be expanded indefinitely without considering the problem of data migration. The disadvantage is that the data distribution of the new table and the old table is uneven, Moreover, it is difficult to select the range of sub tables. Too small range will lead to too many tables, and too large range will lead to the confusion that the problem has not been solved. Another table splitting strategy is to route a column to different tables according to the hash value. Similarly, take user ID as an example. If we plan 10 database tables at the beginning, the routing algorithm can simply use user_ The value of ID% 10 represents the database table number to which the data belongs. The user with ID 985 is put into the sub table with number 5, and the user with ID 10086 is put into the word table with number 6. The advantage of this segmentation rule is that the data of each table is evenly distributed, but the later expansion will be designed to migrate part of the data.
- After the table is split, if there is an order by operation, the database can do nothing. It can only be completed by business code or database middleware.
- When there is a business need for search, the SQL statement can only join multiple tables for query. Similarly, there are statistical needs, such as count.
Have you ever split tables in your business?
More wonderful articles