Introduction Guide to MySQL sub database, sub table and partition

Time:2021-9-15

preface

Relational database is easy to become a system bottleneck. The storage capacity, connection number and processing capacity of a single machine are limited. When the amount of data and concurrency are increased, the database must be segmented.

The means of data sharding is to divide databases and tables. There are two aspects of sub database and sub table, either optical sub database without sub table or optical sub table without sub database.

The core content of distributed database is nothing more than data segmentation, and the positioning and integration of data after segmentation.

Why do you want to separate databases and tables

Sub table

When the amount of data in a single table is too large, it will seriously affect the performance of SQL execution. Generally, when a single table reaches millions, the performance will be relatively poor. At this time, the score table is.

Table splitting is to put the data of one table into multiple tables, and then query one table when querying. For example, divide the table according to the item ID: put a fixed number of item data in one table, so that the amount of data in each table can be controlled within a controllable range.

sub-treasury

According to experience, when a database supports concurrency up to 2000, it needs to be expanded, and the concurrency value of a healthy single database should be kept at about 1000. Then you can split the data of one database into multiple databases, and access one database when accessing.

This is the so-called sub database and sub table. Why do you want to sub database and sub table?

  • Improve concurrency support
  • Reduce disk utilization
  • Improve SQL execution performance

How to divide databases and tables

Look directly at the picture:

For vertical splitting, it is recommended to do a good job in table design at the beginning of system design to avoid vertical splitting.

Horizontal splitting can be divided according to range or according to a field hash. According to the range, the advantage is that the expansion is simple. Just prepare a new table or library. However, it is prone to hot issues, which should be viewed in combination with business scenarios. According to hash, the advantage is that the request pressure of each database or table can be evenly distributed. The disadvantage is that it is troublesome to expand the capacity. The previous data needs to be rehashed, and there is a data migration process.

Problems caused by sub database and sub table

Sub database and sub table can effectively alleviate the pressure of network IO, hardware resources and connection number caused by single machine and single database. But it also brings some problems.

  • Transaction consistency problem
    It is solved by distributed transactions or ensuring final consistency.
  • Cross node Association query join problem
    Global table, field redundancy, data assembly, er fragmentation
  • Cross node paging, sorting and aggregation function problems
    First, query in different partition nodes, and finally summarize or merge the results
  • Global primary key duplication avoidance problem
    Various distributed ID generation algorithms
  • Data migration and capacity expansion
    In case of range fragmentation, you only need to add nodes to expand the capacity.
    If it is a hash, the general practice is to read out the historical data first, and then write the data to each partition node according to the specified partition rules.

data migration

Data migration introduces two schemes.

One of the lowest solutions is to shut down the system for a period of time, run through the tool to realize the written data import, and write the data of a single table to the sub database and sub table.

The second scheme sounds more reliable. The double write migration scheme. In the online system, in all the places where data was written before, the operation of adding, deleting and changing, in addition to adding, deleting and changing the old database, plus adding, deleting and changing the new database, is the so-called double write. Then, after the system is deployed, run the data import tool in scheme 1, read the old library and write the new library. Write according to GMT_ The modified field determines the last modification time of the data. It will not be written unless it is read out that the new database does not exist or is newer than the new database data. Simply put, it is not allowed to overwrite new data with old data.

After one round of writing, there may still be inconsistencies. Then the program will automatically perform a new round of verification and compare each data of each table in the new and old databases. Then, if there are different, read the data from the old database and write it again. Repeat the cycle until the data is completely consistent.

middleware

Common middleware for database and table splitting are:

  • Cobar: it is developed and open source by Alibaba B2B team and belongs to the proxy layer scheme, which is between the application server and the database server. The application accesses the Cobar cluster through the jdbc driver. Cobar decomposes the SQL according to the SQL and sub database rules, and then distributes it to different database instances of the MySQL Cluster for execution. Operations such as read / write separation, stored procedure, cross library join and paging are not supported. It hasn’t been updated in recent years, and few people use it.
  • Tddl: it is developed by Taobao team and belongs to the client layer scheme. It supports basic crud syntax and read-write separation, but does not support join, multi table query and other syntax. At present, only a few are used, because they still rely on Taobao’s diamond configuration management system.
  • Atlas: 360 is open source and belongs to the proxy layer scheme. It has not been maintained for several years, and few companies use it now.
  • Sharding JDBC: Dangdang is open source and belongs to the client layer scheme. At present, it has been renamed shardingsphere. SQL syntax also supports many without too many restrictions. It supports database and table splitting, read-write separation, distributed ID generation, and flexible transactions (best effort delivery transactions and TCC transactions). Moreover, there are many companies and active communities.
  • MYCAT: Based on Cobar transformation, it belongs to the proxy layer scheme. The supported functions are perfect. It is younger than sharding JDBC.

To sum up, sharding JDBC and MYCAT can be considered.

Sharding JDBC, a client layer solution, has the advantages of no deployment, low operation and maintenance cost, no secondary forwarding of the agent layer and high performance. The disadvantage is coupling.
The disadvantage of MYCAT’s proxy layer scheme is that it needs to deploy and operate a set of middleware. The operation and maintenance cost is high, but the advantage is that it is transparent to the project.

MySQL partition (not recommended)

Partition is introduced here to prevent confusion with the concepts of segmentation, database and table.
MySQL supports partition from version 5.1. Partitioning means that a database decomposes a table into several smaller and easier to manage parts according to certain rules. As for the application accessing the database, there is only one table or index logically, but in fact, the table may be composed of multiple physical partitions, that is, it is transparent to the application.
MySQL partition introduces the concept of partition key and adopts divide and conquer method, which is conducive to managing very large tables. The partition key is used to aggregate data according to an interval value, a specific value list or a hash function, so that the data can be distributed in different partitions according to rules. There are six types of partitions available in MySQL 5.7:

  • Range partition: allocate data to different partitions based on a given continuous interval range.
  • List partition: similar to range partition, the difference is that list partition is based on the list of listed values, and range partition is based on a given continuous interval range.
  • Columns partition: similar to range and list, the difference is that the partition key can be multi column or non integer.
  • Hash partition: assign data modulus to different partitions based on the given number of partitions.
  • Key partition: similar to hash partition, but using the hash function provided by mysql.
  • Sub partition: also known as composite partition or composite partition, that is, make another partition under the main partition to divide the data again.

Here is an example of list partition:


CREATE TABLE orders_list (
  id INT AUTO_INCREMENT,
  customer_surname VARCHAR(30),
  store_id INT,
  salesperson_id INT,
  order_date DATE,
  note VARCHAR(500),
  INDEX idx (id)
) ENGINE = INNODB
  PARTITION BY LIST(store_id) (
  PARTITION p1
  VALUES IN (1, 3, 4, 17)
  INDEX DIRECTORY = '/var/orders/district1'
  DATA DIRECTORY = '/var/orders/district1',
  PARTITION p2
  VALUES IN (2, 12, 14)
  INDEX DIRECTORY = '/var/orders/district2'
  DATA DIRECTORY = '/var/orders/district2',
  PARTITION p3
  VALUES IN (6, 8, 20)
  INDEX DIRECTORY = '/var/orders/district3'
  DATA DIRECTORY = '/var/orders/district3',
  PARTITION p4
  VALUES IN (5, 7, 9, 11, 16)
  INDEX DIRECTORY = '/var/orders/district4'
  DATA DIRECTORY = '/var/orders/district4',
  PARTITION p5
  VALUES IN (10, 13, 15, 18)
  INDEX DIRECTORY = '/var/orders/district5'
  DATA DIRECTORY = '/var/orders/district5'
);

Advantages of Zoning:

  • Expand storage capacity.
  • Optimize queries. When the where clause contains partition conditions, you can scan only the necessary partitions to improve the query efficiency; Colleagues can process queries involving aggregate functions such as sum () and count () in parallel on each partition.
  • For data partitions that have expired or do not need to be saved, you can quickly delete data by deleting partitions.
  • Distribute query data across multiple disks to obtain greater query throughput.

summary

This is the end of this article about MySQL sub database, sub table and partition. For more information about MySQL sub database, sub table and partition, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!

Recommended Today

Supervisor

Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]