How to perform data segmentation for MySQL sub tables and sub databases

Time:2021-12-2

Relational database itself is easy to become a system bottleneck, and the storage capacity, connection number and processing capacity of a single machine are limited. When the data volume of a single table reaches 1000W or 100g, due to the large number of query dimensions, the performance still degrades seriously even when adding slave databases and optimizing indexes. At this time, it is necessary to consider segmentation. The purpose of segmentation is to reduce the burden of the database and shorten the query time.

The core content of distributed database is nothing more than data sharding and the positioning and integration of data after sharding. Data segmentation is to store data in multiple databases to reduce the amount of data in a single database. By expanding the number of hosts, it can alleviate the performance problem of a single database, so as to improve the operation performance of the database.

Data segmentation can be divided into two ways according to its segmentation type: vertical (vertical) segmentation and horizontal (horizontal) segmentation.

1. Vertical (longitudinal) segmentation

There are two common types of vertical segmentation: vertical sub database and vertical sub table.

1.1 vertical sub warehouse

It is to store different tables with low correlation in different databases according to business coupling. The method is similar to splitting a large system into multiple small systems

The classification is divided independently. Similar to the approach of “microservice governance”, each microservice uses a separate database. As shown in the figure:

Store the data tables of different modules in different databases. Query without correlation between modules

If so, it needs to be solved through data redundancy or layer secondary processing. This business method and data structure are the clearest. However, if cross database related queries cannot be eliminated, this path will be declared different

1.2 vertical sub table

It is based on the “column” in the database. If a table has many fields, you can create an extended table to split the fields that are not often used or with large field length into the extended table. When there are many fields (for example, a large table has more than 100 fields), it is easier to develop and maintain by “splitting a small table from a large table”, and it can also avoid cross page problems. The bottom layer of MySQL is stored through data pages. If a record occupies too much space, it will lead to cross page, resulting in additional performance overhead. In addition, the database loads data into memory in behavioral units, so that the field length in the table is shorter and the access frequency is higher, the memory can load more data, the hit rate is higher, and the disk IO is reduced, thus improving the database performance.

Advantages of vertical segmentation:

  • Solve the coupling at the business system level and make the business clear
  • Similar to the governance of microservices, it can also manage, maintain, monitor and expand the data of different businesses at different levels
  • In high concurrency scenarios, vertical segmentation can increase the bottleneck of IO, database connections and stand-alone hardware resources to a certain extent

Disadvantages:

  • Some tables cannot be joined and can only be solved through interface aggregation, which increases the complexity of development
  • Complex distributed transaction processing
  • There is still a problem that the amount of data in a single table is too large (horizontal segmentation is required)

2. Horizontal (horizontal) segmentation

When it is difficult for an application to perform fine-grained vertical segmentation, or the number of rows of data after segmentation is huge, and there is a single database read-write and storage performance bottleneck, horizontal segmentation is required.

Horizontal segmentation is divided into intra database sub table and sub database sub table. According to the internal logical relationship of the data in the table, the same table is dispersed into multiple databases or multiple tables according to different conditions. Each table contains only part of the data, so as to reduce the amount of data in a single table and achieve the effect of distribution. As shown in the figure:

Compared with vertical segmentation, which classifies tables, this method stores data in different databases (or different tables) according to a certain rule of each field in the table, that is, data is segmented according to several rows.

Splitting tables in the database only solves the problem of too much data in a single table, but it does not distribute tables to libraries on different machines. Therefore, it is not very helpful to reduce the pressure on MySQL database. We still compete for CPU, memory and network io of the same physical machine. It is best to solve it through splitting tables in the database.

Advantages of horizontal segmentation:

  • There is no performance bottleneck of large amount of single database data and high concurrency, so as to improve system stability and load capacity
  • The transformation of the application end is small, and there is no need to split the business module

Disadvantages:

  • Transaction consistency across shards is difficult to guarantee
  • The performance of cross database join Association query is poor
  • The data is difficult to expand and maintain for many times

After horizontal segmentation, the same table will appear in multiple databases / tables, and the contents of each database / table are different. Several typical data fragmentation rules are:

2.1 according to the numerical range

Segment according to time interval or ID interval. For example, the data of different months or even days are scattered into different databases by date; Divide the records with userid 1 ~ 9999 into the first database, 10000 ~ 20000 into the second database, and so on. In a sense, the “hot and cold data separation” used in some systems migrates some less used historical data to other libraries, and only provides hot data query in business functions, which is a similar practice.

The advantages are:

  • Single table size controllable
  • Naturally, it is convenient for horizontal expansion. In the later stage, if you want to expand the capacity of the whole fragmented cluster, you only need to add nodes without migrating the data of other fragments
  • When using the fragment field for range search, continuous fragments can quickly locate fragments for quick query, effectively avoiding the problem of cross fragment query.

Disadvantages:

  • Hot data becomes a performance bottleneck. There may be data hotspots in continuous sharding, such as sharding by time field. Some shards store data in the latest time period and may be read and written frequently, while some shard stored historical data are rarely queried

2.2 take the mold according to the value

Generally, the segmentation method of hash modulus mod is adopted. For example, the customer table is segmented into four libraries according to the cusno field. The remainder of 0 is placed in the first library, the remainder of 1 is placed in the second library, and so on. In this way, the data of the same user will be scattered into the same database. If the query criteria have a cusno field, you can clearly locate the corresponding database for query.

advantage:

  • Data fragmentation is relatively uniform, and it is not easy to have hot spots and bottlenecks of concurrent access

Disadvantages:

  • In the later stage, the old data needs to be migrated when the partitioned cluster is expanded (this problem can be better avoided by using the consistent hash algorithm)
  • It is easy to face the complex problem of cross fragment query. For example, in the above example, if the frequently used query criteria do not contain cusno, the database cannot be located. Therefore, it is necessary to initiate queries to four databases at the same time, merge the data in memory, and take the minimum set back to the application. On the contrary, the sub database becomes a drag.

The above is the details of how to perform data segmentation in MySQL sub table and sub database. For more information about data segmentation in MySQL sub table and sub database, please pay attention to other relevant articles of developeppaer!