High concurrency database sub database and sub table

Time:2021-8-11

What are sub databases and sub tables

The following is an example of the e-commerce system. The following figure shows the table structure of the seller module of the e-commerce system:
High concurrency database sub database and sub table
Store information and geographic area information related to goods can be obtained through the following SQL:

`Select P. *, R. [geographical area name], S. [store name], S. [reputation]
From [product information] P 
Left join [geographic region] R on P. [origin] = R. [geographic region code]
Left join [store information] s on p.id = S. [store]
WHERE p.id = ?` 

*   1
*   2
*   3
*   4
*   5

With the rapid development of the company’s business, the amount of data in the database increases sharply, and the access performance slows down. Optimization is imminent. Analyze where the problem is? Relational database itself is easy to become a system bottleneck, and the storage capacity, connection number and processing capacity of a single machine are limited. When the data volume of a single table reaches 1000W or 100g, due to the large number of query dimensions, the performance still degrades seriously even when adding slave databases and optimizing indexes.

Option 1:

Improve the data processing capacity by improving the hardware capacity of the server, such as increasing storage capacity and CPU. This scheme costs a lot, and if the bottleneck is MySQL itself, it is also helpful to improve the hardware.

Option 2:

The data is scattered in different databases to reduce the amount of data in a single database to alleviate the performance problem of a single database, so as to improve the performance of the database. As shown in the figure below, the e-commerce database is divided into several independent databases, and the large table is also divided into several small tables, This method of database splitting is used to solve the performance problem of database.
High concurrency database sub database and sub table
Database and table splitting is to solve the problem of database performance degradation due to excessive data volume. The original independent database is divided into several databases, and the large data table is divided into several data tables, so as to reduce the data volume of a single database and a single data table, so as to improve the database performance.

Vertical sub table

The sub warehouse and sub table includes two parts: sub warehouse and sub table. In production, it usually includes four methods: vertical sub warehouse, horizontal sub warehouse, vertical sub table and horizontal sub table.
Let’s start with the vertical table:
Generally, the product details are not displayed in the product list, as shown in the following figure:
High concurrency database sub database and sub table
When browsing the product list, users will view the detailed description of a product only when they are interested in it. Therefore, the access frequency of the commodity description field in the commodity information is low, the storage space of the field is large, and the IO time of accessing a single data is long; The data access frequency of other fields such as commodity name, commodity picture and commodity price in commodity information is high.

Because the characteristics of the two data are different, he considered splitting the commodity information table as follows:

The commodity description information with low access frequency is stored in a separate table, and the basic information of commodities with high access frequency is placed in a separate table.
High concurrency database sub database and sub table
The following SQL can be used for the product list:

`Select P. *, R. [geographical area name], S. [store name], S. [reputation]
From [product information] P 
Left join [geographic region] R on P. [origin] = R. [geographic region code]
Left join [store information] s on p.id = S. [store]
WHERE...ORDER BY...LIMIT...` 

*   1
*   2
*   3
*   4
*   5

When the product description needs to be obtained, it can be obtained through the following SQL:

`SELECT *
From [item description] 
Where [commodity ID] =` 

*   1
*   2
*   3

Vertical table splitting definition: divide a table into multiple tables according to fields, and each table stores some of the fields.
The improvements it brings are:

1. In order to avoid IO contention and reduce the probability of locking the table, users who view details and commodity information browsing do not affect each other

2. Give full play to the operation efficiency of popular data, and the high efficiency of commodity information operation will not be dragged down by the low efficiency of commodity description.

Why the IO efficiency of large fields is low: first, due to the large amount of data itself, it needs longer reading time; The second is cross page. Page is the storage unit of the database. Many search and positioning operations are based on page. The more data rows in a single page, the better the overall performance of the database, the larger the space occupied by large fields and the fewer memory rows in a single page, so the IO efficiency is low. Third, the database loads data into memory in behavioral units, so that the field length in the table is shorter and the access frequency is higher, the memory can load more data, the hit rate is higher, and the disk IO is reduced, thus improving the database performance.

Generally speaking, the access frequency of each data item in a business entity is different. Some data items may be blobs or text that occupy a large storage space. For example, the product description in the above example. Therefore, when the table has a large amount of data, you can cut the table by field, and place the popular fields and unpopular fields in different libraries. These libraries can be placed on different storage devices to avoid IO contention. The performance improvement brought by vertical segmentation mainly focuses on the operation efficiency of hot data, and the disk contention is reduced.

We usually split vertically according to the following principles:

  1. Put the uncommon fields in a separate table;
  2. Split large fields such as text and blob and put them in the attached table;
  3. The columns of frequent combined queries are placed in one table;

Vertical sub database

The performance of vertical table splitting has been improved to a certain extent, but it has not met the requirements, and the disk space is not fast enough. Because the data is always limited to one server, the vertical table splitting in the library only solves the problem of too much data in a single table, but does not distribute the tables to different servers. Therefore, each table still competes for the CPU, memory Network IO, disk.

After thinking, he put the original seller_ DB (seller Library), divided into product_ DB (commodity Library) and store_ DB (store library) and distribute the two libraries to different servers, as shown in the following figure:
High concurrency database sub database and sub table
Due to the high coupling between commodity information and commodity description business, it is stored in product together_ DB (commodity warehouse); The store information is relatively independent, so it is stored separately in the store_ DB (store library).

Vertical sub database refers to classifying tables according to business and distributing them to different databases. Each database can be placed on different servers. Its core concept is special database.

The improvements it brings are:

  • Solve the coupling at the business level and make the business clear
  • It can carry out hierarchical management, maintenance, monitoring and expansion of data of different businesses
  • In the high concurrency scenario, the vertical sub database can increase the number of IO and database connections to a certain extent and reduce the bottleneck of stand-alone hardware resources

    The vertical sub database classifies tables by business and then distributes them in different databases, and these databases can be deployed on different servers, so as to achieve the effect of pressure sharing among multiple servers, but it still does not solve the problem of too large amount of data in a single table.

Horizontal sub database

After vertical database distribution, the database performance problem has been solved to a certain extent, but with the growth of business, product_ The data stored in dB (commodity warehouse) single warehouse has exceeded the estimate. It is roughly estimated that there are currently 8W stores, with an average of 150 goods of different specifications in each store. If you count the growth, the number of goods must be estimated at 1500W + and product_ DB (commodity Library) is a resource that is accessed very frequently and cannot be supported by a single server. How to optimize at this time?

Again? However, from the perspective of business, the current situation can no longer be divided vertically.

Try to divide the stores horizontally, and put the commodity information with odd store ID and even store ID in the two libraries respectively.

High concurrency database sub database and sub table
In other words, to operate a piece of data, first analyze the store ID to which the data belongs. If the store ID is an even number, map this operation to rroduct_ Db1 (commodity warehouse 1); If the store ID is singular, the operation is mapped to rroduct_ DB2 (commodity library 2). The expression to access the database name for this operation is rroduct_ DB [store ID% 2 + 1].

Horizontal sub database is to split the data of the same table into different databases according to certain rules, and each database can be placed on different servers.

Vertical database splitting is to split different tables into different databases. It splits data rows and does not affect the table structure

The improvements it brings are:

  • It solves the performance bottleneck of single database, big data and high concurrency.
  • The stability and availability of the system are improved.

Stability is reflected in the reduction of IO conflicts and locks. Availability refers to the problem of a library and some are available`

When it is difficult for an application to perform fine-grained vertical segmentation, or the number of rows of data after segmentation is huge, and there is a single database read-write and storage performance bottleneck, it is necessary to perform horizontal segmentation. Through the optimization of horizontal segmentation, the single inventory reserve and performance bottleneck can often be solved. However, because the same table is assigned to different databases, additional routing of data operation is required, which greatly improves the system complexity.

Horizontal sub table

According to the idea of horizontal sub library, product_ DB_ The tables in X (commodity Library) can also be split horizontally to solve the problem of large amount of data in a single table, as shown in the following figure:
High concurrency database sub database and sub table
It is similar to the idea of horizontal sub database, but the goal of this operation is table. Commodity information and commodity description are divided into two sets of tables. If the product ID is an even number, map this operation to the product information 1 table; If the item ID is singular, map the operation to the item information 2 table. The expression to access the table name for this operation is product information [product ID% 2 + 1].

Horizontal split table is to split the data of the same table into multiple tables according to certain rules in the same database.

The improvements it brings are:

  • Optimize the performance problems caused by the large amount of data in a single table
  • Avoid IO contention and reduce the probability of locking tables

    The horizontal table division in the library solves the problem of excessive data in a single table. The separated small table contains only part of the data, so as to reduce the data in a single table and improve the retrieval performance.

summary

Vertical table splitting: the fields of a wide table can be divided into multiple tables according to the access frequency and whether they are large fields. This can not only make the business clear, but also improve some performance. After splitting, try to avoid associated queries from a business perspective, otherwise the performance will not pay off.

Vertical sub database: multiple tables can be loosely classified according to business coupling and stored in different libraries. These libraries can be distributed in different servers, so that the access pressure is loaded by multiple servers, greatly improving the performance, and improving the business clarity of the overall architecture. Different business libraries can customize optimization schemes according to their own conditions. But it needs to solve all the complex problems caused by cross library.

Horizontal sub database: the data of a table (by data row) can be divided into multiple different databases. Each database has only part of the data of the table. These databases can be distributed on different servers, so that the access pressure is loaded by multiple servers, which greatly improves the performance. It not only needs to solve all the complex problems caused by cross database, but also solve the problem of data routing (introduced later).

Horizontal sub table: the data of a table (by data row) can be divided into multiple tables in the same database. Each table has only part of the data of this table. This can slightly improve the performance. It is only used as a supplementary optimization of the horizontal sub database.

Generally speaking, in the system design stage, the vertical database and table splitting scheme should be determined according to the tightness of business coupling. When the amount of data and access pressure are not particularly large, cache, read-write separation, index technology and other schemes should be considered first. If the amount of data is huge and continues to grow, consider the scheme of horizontal database and horizontal table.

Sharding JDBC video sharing

Sharding JDBC video sharing