One article quick introduction sub database sub table (compulsory course)

Time:2020-10-26

There were a lot of new entrants beforeJavaThe fans leave a message, want to learn the technology of sub database and sub table systematically, but I haven’t made up my mind to do it. Now I catch up with the company’s project in usesharding-jdbcFor existingMySQLThe architecture is transformed into sub database and sub table, so taking this opportunity to produce a series of sub database sub table implementation practice articles is also a summary of my own learning of architecture.

I have read some articles about sub database and sub table continuously on the Internet. I can find that there are a lot of homogenized materials on the Internet, and the knowledge points are relatively fragmented. There is no detailed actual combat case. In order to learn more deeply, I bought some paid courses on some platforms. After watching several classes, I found that it was OK for those who had some experience. However, for beginners, the learning difficulty is still quite large.

In order to let the novice understand, I may use more space to describe some knowledge points. I hope you don’t think I’m too wordy. After the series of articles on sub database and sub table are finished, I will make itPDFOpen source documents out, can help a calculate a bar! If you find any mistakes or laxity in the article, you are welcome to exchange and correct.

Specific practice sub database sub table before a few wordy sentences, review the basic concepts of sub database sub table.

What is sub database and sub table

actuallysub-treasuryandSub tableThey are two concepts, but usually the operation of sub database and sub table will be carried out at the same time, so that we habitually combine them together, which is called sub database and sub table.

The purpose of sub database and sub table is to solve the problem that the database performance continues to decline due to the large amount of database and table data. According to certain rules, the database with large amount of data is divided into several separate databases, and the tables with large amount of data are divided into several data tables, so that the performance of single database and table can reach the optimal effect (fast response speed), so as to improve the overall database performance.

How to divide database and table

The core idea of sub database and sub table is to segment data(Sharding)And how to quickly locate the data and integrate the query results after segmentation. Both sub database and sub table can be accessed from:vertical(longitudinal) andlevel(horizontal) two latitudes are divided.

One article quick introduction sub database sub table (compulsory course)


Next, we will take the order related business as an example to see how to do the library and tableverticalandlevelSegmentation.

vertical partitioning

Vertical segmentation hasverticalSub database andverticalSub table.

1. Vertical sub database

Vertical database is relatively easy to understand. The core concept is four words:Special Library

Tables are classified by business type, and corresponding tables such as orders, payments, coupons, and points are placed in the corresponding database. Developers cannot directly connect to other business databases across databases. If they want other business data, the corresponding business party can provide themAPIInterface, which is the initial form of microservices.

The vertical sub database largely depends on the division of business, but sometimes the division between businesses is not so clear. For example, the relationship between order data splitting and other businesses should be considered. It is not so simple to put order related tables in one database.

To a certain extent, the vertical database seems to improve the performance of some databases, but in fact it does not solve the performance problems caused by the large amount of data in a single table, so it needs to be solved with horizontal segmentation.

One article quick introduction sub database sub table (compulsory course)

2. Vertical sub table

Vertical sub tableIt is based on the column (field) of the data table, which is a mode of splitting large tables into small tables.

For example: oneorderOrder table: separate the frequently accessed fields such as order amount and order number into a single tableblobFor large fields like type or infrequently accessed fields, split them out to create a separate extended tablework_extendIn this way, each table only stores some fields of the original table, and then distributes the split tables into different libraries.

One article quick introduction sub database sub table (compulsory course)

We know that the database loads data into memory by behavior unit. After splitting, most of the core tables are fields with high access frequency, and the length of the fields is also short. Therefore, more data can be loaded into memory to increase the hit rate of queries and reduce disk IO, so as to improve the database performance.

Advantages of vertical segmentation

  • Data decoupling between services, independent maintenance, monitoring and expansion of data of different services.
  • In the high concurrency scenario, the database pressure is relieved to some extent.

Disadvantages of vertical segmentation

  • It improves the complexity of development. Due to the isolation of business, many tables cannot be accessed directly, so data must be aggregated by interface.
  • The difficulty of distributed transaction management increases.
  • The problem of large amount of single table data still exists in the database, which has not been fundamentally solved and needs to cooperate with horizontal segmentation.
    • *

horizontal partitioning

As mentioned above, vertical sharding still has the problem of large amount of data in a single database and table. When our application can no longer achieve fine-grained vertical segmentation,
There are still single database read-write and storage performance bottlenecks. At this time, it is necessary to cooperate with horizontal segmentation, which can greatly improve database performance.

1. Horizontal sub database

Horizontal database splitting is to split the same table into different databases according to certain rules. Each database can be located on different servers to realize horizontal expansion. It is a common way to improve the performance of database.

This scheme can often solve the single inventory storage and performance bottleneck problems, but because the same table is allocated in different databases, data access requires additional routing work, so the complexity of the system is also increased.

For example, below,Order DB \u 1Order DB \u 1Order DBThere are identical tables in the three databasesorderWhen accessing an order, we can use the mode of taking the order number of the orderOrder number mod 3 (number of database instances)To specify which database the order should operate in.

One article quick introduction sub database sub table (compulsory course)

2. Horizontal sub table

The horizontal sub table is in theWithin the same databaseAccording to certain rules, a table with a large amount of data is divided into several tables with exactly the same structure, and each table only stores a part of the data of the original table.

For example: oneorderThere are 9 million data in the order table, which are divided into three tables horizontally,order_1order_2order_3Each table contains 3 million data, and so on.

One article quick introduction sub database sub table (compulsory course)

Although the horizontal table splits the tables, all the sub tables are still in the same database instance, which only solves the problem of excessive data volume of a single table. It does not disperse the split tables to different machines, but also competes for CPU, memory, network IO, etc. of the same physical machine. To further improve the performance, it is necessary to distribute the split tables into different databases to achieve the distributed effect.

One article quick introduction sub database sub table (compulsory course)

The advantages of horizontal segmentation are as follows

  • It can solve the problem of too much data in a single database in high concurrency, and improve the stability and load capacity of the system.
  • The workload of business system transformation is not very large.

The disadvantages of horizontal segmentation are as follows

  • It is difficult to guarantee the transaction consistency across Shards.
  • The performance of cross database join Association query is poor.
  • It is difficult to expand the capacity and the maintenance amount is large (it is terrible to split it into thousands of sub tables).
    • *

What are the rules

We mentioned it many timesCertain rulesThis rule is actually a routing algorithm, which determines which database and table a piece of data should be stored in.

The common ones areModular algorithmandRange limiting algorithm

1. Modular algorithm

Taking modulus by field (take remainder of hash result (hash () mod n), n is the number of database instances or the number of sub tables) is the most common way of segmentation.

Take itorderFor example, first number the database from 0 to n-1, andorderIn the order formwork_noTake the module from the order number field to get the remainderii=0Save the first library,i=1Save a second library,i=2Save a third Library… And so on.

In this way, the data of the same order will be stored in the same database and table, and the same rules will be used when queryingwork_noAs a query condition, the order number can quickly locate the data.

advantage:

  • The data fragmentation is relatively uniform, so it is not easy for requests to be sent to one database.

Disadvantages:

  • There are some problems with this algorithm. When a machine goes down, the requests that should have fallen on the database cannot be handled correctly. At this time, the dropped instance will be kicked out of the cluster. At this time, the algorithm becomes hash (userid) mod n-1, and the user information may no longer be in the same database.

2. Range limiting algorithm

according toTime intervalorID intervalFor example, we split the user table, which can be defined for each libraryUserOnly 10000 pieces of data are stored in the table, and only 10000 pieces of data are stored in the first databaseuserIdData from 1 to 9999, second inventoryuserIdFrom 10000 to 20000, the third inventoryuserIdIt’s 20001-30000… And so on, according to the time range.

advantage:

  • The data quantity of single table is controllable
  • Horizontal expansion is simple, just add nodes, no need to migrate other fragmented data
  • It can quickly locate which database the data to be queried is in

Disadvantages:

  • Because there may be data hotspots in continuous slicing, such as slicing by time field, orders may increase sharply in a certain period of time, which may be read and written frequently. However, historical data stored in some partitions is rarely queried.

Difficulties of sub database and sub table

1. Distributed transaction

Because the tables are distributed in different databases, cross database transactions will inevitably occur. Generally available“Three stage submission“And”Two stage submission“Processing, but this method has poor performance and a large amount of code development. The common way is to achieve the final consistency. If the real-time consistency of the system is not required, only the final consistency can be achieved within the allowed time period, and the transaction compensation method is adopted.

Here I apply Alibaba’s distributed transaction frameworkSeataTo do distributed transaction management, the following will be combined with the actual case.

2. Paging, sorting and cross database joint query

Paging, sorting, and joint query are very frequently used functions in the development, but these seemingly ordinary operations are a headache after the sub database and sub table. Query the data of tables scattered in different databases, and then provide all the results to users after summarizing and sorting.

3. Distributed primary key

After sub database and sub table, the auto increment primary key of database has little significance, because we can’t rely on the auto increment primary key on a single database instance to realize the global unique primary key between different databases. At this time, a system that can generate a global unique ID is very necessary. Then this globally unique ID is calledDistributed ID

4. Separation of reading and writing

It is not difficult to find that most of the mainstream relational databases provide high availability solutions of master-slave architecture, which we need to implementSeparation of reading and writing + Sub database and sub table, reading and writing database should be processed by sub database and sub table, and there will be specific practical cases later.

5. Data desensitization

Data desensitization refers to the conversion of sensitive information to the sensitive data through the desensitization rules, thus realizing the reliable protection of sensitive privacy data, such as ID number, mobile phone number, card number, password and other personal information.

Sub database and sub table tool

I still say that, try not to make wheels by ourselves, because the wheels made by ourselves may not be so round. There are many mature sub database and sub table Middleware in the industry. We select them according to our own business needs and focus more on business implementation.

  • sharding-jdbc(Dangdang)
  • TSharding(mushroom Street)
  • Atlas(Qihu 360)
  • Cobar(Alibaba)
  • MyCAT(based on Cobar)
  • Oceanus(58 cities)
  • Vitess(Google)

Why sharding JDBC

sharding-jdbcIt’s lightweightJavaFramework tojarThe package provides services. It belongs to the client product and does not need additional deployment. It is equivalent to an enhanced versionJDBCTo drive; in contrast toMycatThis kind of service-side products that require separate deployment services are slightly more complicated. Besides, I want to put more energy into the implementation of the business, and I don’t want to do extra operation and maintenance work.

  • sharding-jdbcThe compatibility of is also very powerful, suitable for any based onJDBCOfORMFramework, such as:JPAHibernateMybatisSpring JDBC TemplateOr directly usedJDBC
  • Perfectly compatible with any third-party database connection pool, such as:DBCPC3P0BoneCPDruidHikariCPIt supports almost all relational databases.

It is not difficult to find that it is a relatively powerful tool, and it is very intrusive to the project, hardly need to make any code layer changes, nor need to be modifiedSQLStatement, you only need to configure the data table of the sub database and sub table.

summary

Briefly review the basic knowledge of sub database and sub table, and the following article will introduce the actual projectsharding-jdbcEach function point in the sub database and sub table.

Link to original text
This article is the original content of Alibaba cloud and can not be reproduced without permission.