PHP + MySQL 10 million level data processing case (1) (sub table)


Cut the crap and go straight to the scene
A financial company has 500 W investment users, and 50 W investments are recharged every day. So the company has nearly 100 million recharge records every year. How do we deal with the data of this recharge order table? It’s difficult to put it in a table. If I ask you to count the records that meet a certain demand one day and search 100 million pieces of data, you will be tired of MySQL! Today we will talk about how to deal with this situation.

The idea of distributed table in MySQL
Sub table is not a random sub table, it must be combined with the actual situation of the project, such as where is the bottleneck of our project, only tens of thousands or millions of data with sub table, that is, anti-aircraft artillery hit mosquitoes, do not blindly sub table! Must reach a certain order of magnitude, and affect the access speed of our users, performance degradation in order to consider to do sub table processing!
Let’s draw a simple picture. It can’t be any simpler. I like drawing
PHP + MySQL 10 million level data processing case (1) (sub table)
The idea of dividing tables is so simple. With the help of middleware, you can insert the tables corresponding to different provinces according to the orders of different provinces. Of course, in practice, you have to combine your own business to find and make this middleware. Don’t copy other people’s logical ideas!
For example, 12306 always broke down at that time. So many railway lines across the country scrambled for tickets at 8:00 in the morning, and hundreds of millions of people scrambled for tickets at the same time. No matter how powerful the technology was, it was hard to resist. Of course, Ali was very powerful. At that time, a lot of people gave them advice and provided various solutions. They said that Taobao tmall Jingdong did this, I don’t know the actual business of 12306. Many people force me blindly. Later, 12306 also used the form of distribution. It released the ticket grabbing of each railway line at different peak, alleviating the huge impact brought by large traffic. So I must combine my own actual business to do all kinds of solutions, just like the middleware here! No more nonsense!

MySQL distributed sub table actual combat (insert total table score and then take module to distribute to sub table)

//Pseudo code
//Suppose that the user data storage is set as 50W data per day
//We create a user main table + two user sub tables user sub table user0 user sub table user1
//First, all user data is entered into the user main table
$sql = insert into users(a,b,c) values($a,$b,$c);
$res = $model->query($sql);
    //Get the ID of the last SQL inserted in the user main table
    $insert_id = $model->getLastInsId();
    //If you have two types of user sub tables, you will get% 2. If you have 200 records, you will get% 200
    $d = $insert_id%2;
    //If any number is modular to 2, then it is not 0 or 1. If any number is modular to 200, then it is between 0 and 200
    //Insert it into different user sub tables
    $_sql = insert into user{$d} (a,b,c) values($a,$b,$c);
    $ru = $model->query($_sql);

In fact, we use the form of module fetching to make a middleware function. According to the module value, we insert data into different user0 and user1 to achieve the effect of distribution
In the whole process, the main table must insert data first, and then insert data into different sub tables according to the module;
The user in the user sub table needs to be emphasized_ ID primary key must not be self incrementing. It must be inserted according to the ID in the user main table. It must be consistent with the primary key ID in the main table!
It should be emphasized that the user main table and the user0 user1 sub table must be directly copied with the same field properties, but the data is distributed to different tables, which is also called horizontal sub table!

When making the horizontal sub table of MySQL, we should remember the hidden rules. By default, when adding new data, we insert it into the user main table, and then insert it into different sub tables according to the module. However, deleting, modifying, and viewing all have nothing to do with the main table. The main table can only write data into it when adding new data, and then distribute it to different sub tables according to the module ID! Of course, the main table can also do some other statistics.

MySQL distributed sub table actual combat (modify view delete sub table data)
As we have said above, when modifying or deleting a query, we directly operate on the sub table instead of the main table

//Pseudo code
$id = $_POST['id'];
//Module the primary key ID to determine which sub table to operate
$d = $id%2;
$sql = update user{$d} set username='{$username}',age='{$age}' where id = $id;

Modify or delete is also based on the primary key ID to determine which sub table to modify or delete

But if we delete or modify it, the data in the main table is inconsistent with that in the sub table. How can we ensure that the data in the main table is consistent with that in the slave table?

Many people will want to operate the main table immediately after the operation of the sub table. The operation is too bad, and it’s OK! But don’t you feel wasted? If you have learned the coroutine, you can start a coroutine to operate the main table, or an asynchronous task to operate the main table. Both of them are executed asynchronously, which does not affect the execution of the program. This is a very good practice. The premise is that you have to understand what is the coroutine and the task asynchronous task in the spool. As mentioned in the previous blog, you can go to my blog.
Another old-fashioned way is to use the form of queue. No, no, no, no, it’s not called queue. The list in redis is called list at best. Redis5 adds a new type, stream type, which is the real queue. It’s equivalent to the beggar’s version of Kafka. Use the list type in redis to change the SQL statement to modify the main table, The condition ID is just the main table of the operation table, and then it is put into the list to lpush, and then it is enough to get a timed task rpop to update the main table! This method is very stupid, but it also works!

At the beginning, we said that we must combine with the actual business requirements. What we mentioned above is that we distribute the data to different sub tables according to the primary key ID. therefore, whether we query, modify or delete, we must combine the primary key ID to determine the sub table to be operated. There are certain limitations, but our actual business is designed in this way, So it is still the nagging words, we must combine with their actual business needs;

Sub table is not the best solution, and it is not widely used in practical applications. It makes sense to exist. It mainly depends on your business needs!

This work adoptsCC agreementReprint must indicate the author and the link of this article

Hu Jun

Recommended Today

Swift advanced 08: closure & capture principle

closure closurecanCapture and storageOf any constants and variables defined in their contextquote, this is the so-calledClose and wrap those constants and variablesTherefore, it is called“closure”Swift can handle everything for youCaptured memory managementOperation of. Three forms of closure [global function is a special closure]: a global function is a closure that has a name but does […]