The content discussed today is the separation of hot and cold. Perhaps the concept is not strange and familiar with its use scenarios, but we still need to think carefully about the content of lock. There are still many “pits” in our actual development.
Business scenario I
I have experienced architecture optimization related to the supply chain. At that time, there was an order function on the platform. There were tens of millions of data in the main table. With the associated table, the data reached hundreds of millions.
Such a huge amount of data makes the platform’s query of orders particularly slow. It takes 20 or 30 seconds to query once, and more clicks will lead to downtime. For example, when the salesman queries many times, the CPU of the database will soar immediately, and the server thread will not fall down.
At that time, we tried to optimize the table structure, business code, index, SQL statements and other methods to improve the response speed, but these methods treated the symptoms rather than the root cause, and the query speed was still very slow.
Considering that we have other high priority requirements to deal with, we feed back to the business party: “you don’t need this function in the future. Put up with it for the time being.” But after a period of time, the business side really couldn’t stand it. They directly talked hard with us, but we gave in.
Finally, we decided to adopt a cost-effective solution to solve this problem simply and conveniently. When processing data, we divide the database into two libraries: cold storage and hot storage. Infrequently used data is stored in cold storage and commonly used data is stored in heat storage.
After processing in this way, because the salesperson basically queries the recently used data, the amount of commonly used data is greatly reduced, there will be no downtime, and the database response speed is greatly improved.
In fact, the above method is “hot and cold separation”.
1、 What is hot and cold separation
The data stored in the cold database and the data stored in the hot database are divided into two databases, i.e. the data stored in the cold database and the data stored in the hot database.
2、 Under what circumstances does hot and cold separation be used?
Assuming that the business requirements are as follows, the solution of cold heat separation can be considered:
- After the data reaches the final status, there are only read and no write requirements, such as the order completion status;
- Users can query the old and new data separately. For example, some e-commerce websites only allow you to query orders within 3 months by default. If you want to query orders 3 months ago, you need to visit another separate page.
3、 Realization of cold heat separation
In the actual operation, the overall realization idea of cold and heat separation is as follows:
1. How to judge whether a data is cold data or hot data?
2. How to trigger hot and cold data separation?
3. How to separate hot and cold data?
4. How to use hot and cold data?
Next, we make a detailed analysis of the above four problems.
（1） How to judge whether a data is cold data or hot data?
Generally speaking, when judging whether a data is cold data or hot data, we mainly use the combination of one or more fields in the main table as the identification. Among them, this field can be a time dimension, such as the “order time”. We can treat the order data three months ago as cold data and those within three months as hot data.
Of course, this field can also be a status dimension. For example, it can be distinguished according to the “order status” field. Closed orders are treated as cold data and open orders are treated as hot data.
We can also distinguish by combining fields. For example, we identify orders with an order time > 3 months and a status of “closed” as cold data, and others as hot data.
In actual work, which field to use in the end depends on your actual business.
As for the logic of judging cold and hot data, here are two points to note:
- If a data is identified as cold data, the business code will no longer write to it;
- There is no need to read cold / heat data at the same time.
（2） How to trigger hot and cold data separation?
After understanding the judgment logic of hot and cold data, we should start to consider how to trigger the separation of hot and cold data. Generally speaking, there are three kinds of trigger logic for cold and hot data separation.
1. Modify the business code directly, and trigger the hot and cold separation every time you modify the data (for example, this logic is triggered every time the order status is updated);
2. If you don’t want to modify the original business code, you can trigger it by listening to the database change log binlog (database trigger can also be used);
3. The method of triggering the data by the scheduled task or the scheduled task of the database;
For the above three trigger logic, which one is better? After reading the analysis in the following table, you will have the answer in your heart.
|Modify business code of write operation||Listen to database change log||Regularly scan database|
|advantage||1. The code is flexible and controllable. 2. Ensure real-time performance||1. Decouple from business code. 2. Low delay can be achieved.||1. Decouple from business code. 2. You can overwrite scenes that distinguish hot and cold data according to time.|
|shortcoming||1. Cold and hot cannot be distinguished according to time. When the data becomes cold data, no operation may be performed during the period. 2. You need to modify the code for all data write operations.||1. Cold and hot cannot be distinguished by time. When the data becomes cold data, no operation is performed during the period. 2. The problem of data concurrent operation needs to be considered, that is, the business code and hot and cold change code operate the same data at the same time.||1. Can not achieve real-time|
According to the comparison of table contents, we can get the recommended scenarios of each starting logic.
- Modify business code of write operation: it is recommended to use when the business code is relatively simple and does not distinguish between hot and cold data according to time.
- Listen to database change log: it is recommended to use when the business code is complex and cannot be changed at will, and the hot and cold data are not distinguished according to time.
- Regularly scan database: it is recommended to use when distinguishing hot and cold data according to time.
（3） How to separate hot and cold data?
The basic logic for separating hot and cold data is as follows:
1. Judge whether the data is cold or hot;
2. Insert the data to be separated into the cold data;
3. Then delete the separated data from the hot database.
This logic seems simple, but when actually making a scheme, we have to take the following three points into account, which is not simple.
(1) Consistency: how to ensure the consistency of data when multiple databases have been modified at the same time
The consistency requirements mentioned here refer to how to ensure that the data is consistent after an error in any step. The solution is to ensure that each step can be retried and the operation is idempotent. The specific logic is divided into four steps.
In the hot database, add a flag to the data to be moved: Flag = 1. (1 represents cold data, 0 represents heat data)
Find out all the data to be moved (flag = 1): this step is to ensure that some previous threads fail for some reasons, and some data to be moved is not moved.
Save a copy of data in the cold database, but add a judgment in the saving logic to ensure idempotency (it needs to be surrounded by transactions here). Generally speaking, if the data we save already exists in the cold database, we should also ensure that this logic can continue.
Delete the corresponding data from the hot database.
(2) Large amount of data: assuming that the amount of data is large and cannot be processed at one time, what should we do? Is batch processing required?
The first two of the three cold and hot separation trigger logics mentioned above basically do not have the problem of large data volume, because only the data changed at that moment needs to be operated each time, but if the logic of timed scanning is adopted, the problem of data volume needs to be considered.
The implementation logic is also very simple. We can add batch logic where we move data. For ease of understanding, let’s look at an example.
Suppose we can move 50 pieces of data at a time:
A. mark the data to be moved in the thermal database: Flag = 1;
B. find out the first 50 data to be moved (flag = 1);
C. keep a copy of data in the cold database;
D. delete the corresponding data from the thermal database;
E. cycle execution B.
(3) Concurrency: assuming that the amount of data is large enough to be distributed to multiple places for parallel processing, what should we do?
In the scenario of regularly transporting hot and cold data (such as every day), suppose the amount of data processed every day is so large that it is too late for single thread batch processing, what should we do? At this time, we can open multiple threads for concurrent processing. (Although multithreading is faster in most cases, I have encountered this situation: when the single thread batch size reaches a certain value, the efficiency is particularly high, which is faster than any batch size of multithreading. Therefore, it should be noted that if the speed of multithreading is not fast, we will consider controlling single thread.)
When multiple threads carry hot and cold data at the same time, we need to consider the following implementation logic.
Step 1: how do I start multithreading?
Because we use timer trigger logic, the most cost-effective way of this trigger logic is to set multiple timers and make the interval between each timer shorter, and then start a thread to carry data every time.
Another suitable way is to create a thread pool by yourself, and then trigger the following operations regularly: first calculate the number of hot data to be moved, and then calculate the number of threads to be started at the same time. If it is greater than the number of thread pools, take the number of threads in the thread pool. Assuming that this number is n, and finally cycle n times to start the threads in the thread pool to carry cold and hot data.
Step 2: a thread announces that a data is operating, and other threads do not move (lock).
For this logic, we need to consider three features.
Obtain the atomicity of the lock:When a thread finds that a data to be processed is not locked, and then locks it, the two operations must be atomic, that is, they either succeed or fail together. The actual operation is to first add two fields lockthread and locktime in the table, then find out the unlocked or lock timeout data to be migrated through an SQL statement, then update lockthread = current thread and locktime = current time, and finally realize atomicity by using MySQL’s update lock mechanism.
Obtaining locks must be consistent with starting processing:When the current thread starts processing this data, you need to check again whether the data of the next operation is locked successfully by the current thread. The actual operation is to query the data of lockthread = current thread again, and then process the queried data.
The release lock must be consistent with the completion of processing:After the current thread processes the data, it must ensure that the lock is released.
Step 3: after a thread completes normal processing, the data is not in the hot storage, but directly goes to the cold storage. This is normal logic, but there is nothing special to pay attention to.
Step 4: what if a thread fails to exit and the lock is not released (lock timeout)?
Lock cannot be released:What if the thread that locked the data exits abnormally and has no time to release the lock, so that other threads cannot process the data? The solution is to set a timeout for the lock. If the lock is not released after timeout, other threads can process the data normally.
When setting the timeout, we should also consider what to do if the processing thread does not exit and the timeout is caused by processing data? The solution is to set the timeout time as much as possible to exceed the reasonable time for processing data, and the code for processing hot and cold data must be idempotent.
Finally, we have to consider an extreme case: what if the current thread is still processing data, the lock of the data being processed times out, and another thread locks the data being processed? We only need to judge fault tolerance at each step, because the code for handling hot and cold data is relatively simple. Through this operation, the processing of the current thread will not destroy the consistency of data.
（4） How to use cold data
On the query interface of function design, there is usually an option for us to choose whether to query cold data or hot data. If it is not provided on the interface, we can distinguish it directly in the business code. (Note: when judging whether it is cold data or hot data, we must ensure that users are not allowed to read cold and hot data at the same time.)
How to migrate historical data?
Generally speaking, as long as the architecture scheme related to the persistence layer, we need to consider the migration of historical data, that is, how to make the historical data of the old architecture applicable to the new architecture?
Because the previous separation logic just covers this problem when considering the failed retry scenario, the solution to this problem is also very simple. We only need to mark all historical data: after flag = 1, the program will migrate automatically.
Deficiency of cold heat separation solution
It has to be said that the hot and cold separation solution can indeed solve the problems of slow write operation and slow hot data, but there are still many deficiencies.
不足一： 用户查询冷数据速度依旧很慢，如果查询冷数据的用户比例很低，比如只有 1%，那么这个方案就没问题。
看到这里了，点个赞，转个发不行吗？ 嘿嘿！ 关注个人公众号“服务端技术精选”，还有更多好玩的等你哟！