How to ensure double write consistency between cache and data

Time:2022-1-15

introduction

Why did you write this article?

Firstly, cache has been widely used in projects because of its high concurrency and high performance. In terms of reading cache, we have no doubt. We all carry out business operations according to the process shown in the figure below.

How to ensure double write consistency between cache and data

However, in terms of updating the cache, whether to update the cache or delete the cache after updating the database. Or delete the cache first and then update the database. In fact, there is a lot of controversy. At present, there is no comprehensive blog to analyze these schemes. So the blogger wrote this article trembling against the risk of being sprayed by everyone.

Article structure

This paper consists of the following three parts: 1. Explain the cache update strategy; 2. Analyze the shortcomings of each strategy; 3. Give an improvement scheme for the shortcomings

text

Let’s start with an explanation. In theory,Setting the expiration time for the cache is the solution to ensure the final consistency. In this scheme, we can set the expiration time for the data stored in the cache. Even if the data written to the cache this time is dirty, as long as the expiration time is reached, the subsequent read requests will naturally read new values from the database and backfill the cache.Therefore, the idea discussed next does not depend on setting the expiration time for the cache. Here, we discussThree kindsUpdate policy:

  • 1. Update the database and cache first
  • 2. Delete the cache first, and then update the database
  • 3. Update the database before deleting the cache

Why not update the cache first and then the database? Because if the cache is updated successfully, but the database update fails, the data will be inconsistent.

So why can the second scheme be? For example, if the cache is deleted successfully, but the data update fails, the data inconsistency will not be caused at this time, because the cache is simply deleted. You can rewrite and write to the cache the next time you get the data.

(1) Update the database before updating the cache

This scheme is universally opposed. Why? There are two reasons.

Reason 1 (thread safety)If both request a and request B perform update operations, the

  • (1) Thread a updated the database
  • (2) Thread B updated the database
  • (3) Thread B updated the cache
  • (4) Thread a updated the cache

This shows that request a should update the cache earlier than request B, but B updated the cache earlier than a. This leads to dirty data and is therefore not considered.

Under normal circumstances, updating the cache must be faster than writing to the database. If the above scenario occurs, it indicates that the speed of 2 and 3 is greater than 4. There is only one reason for this. When thread a updates the cache, it is blocked due to network reasons, resulting in thread B executing first. The probability of this is still very small.

Reason 2 (from the perspective of business scenario)There are two points:

  • (1) If you are a business requirement with more database writing scenarios and less data reading scenarios, using this scheme will lead to frequent updates to the cache before the data is read at all, wasting performance.
  • (2) If you write the value of the database, you do not write it directly to the cache, but write it to the cache after a series of complex calculations. Therefore, it is undoubtedly a waste of performance to calculate the write cache value again after each write to the database. Obviously, deleting the cache is more appropriate.

The most controversial issue is to delete the cache first and then update the database. Or update the database first and then delete the cache.

Solution

The thread safety problem mentioned in reason 1 is caused by the fact that the data is not locked, so we can write the “update database first and then update cache” operation into the transaction, so that the exclusive lock will be added to the data when updating the data, and the lock can be released only after the end of the transaction,Adding transactions will certainly reduce the throughput of the system.

(2) Delete the cache before updating the database

The reason why the scheme will lead to inconsistency is. At the same time, one request a to update and the other request B to query. Then the following situations will occur:

  • (1) Cache delete request, a
  • (2) Request B query found that the cache does not exist
  • (3) Request B to query the database to get the old value
  • (4) Request B to write the old value to the cache
  • (5) Request a to write the new value to the database, which will lead to inconsistency. and,If not, giveCache setting expiration policy, the data is always dirty

Under normal circumstances, the time to write to the database must be greater than the time to read the database. The operations in steps 2 and 4 are very fast in the cache, and the third step is to read the database. Therefore, it is easy to happen that the operations in steps 2, 3 and 4 are faster than those in step 5, that is, the scenario described above is very easy to happen.

So,How to solve it? useDelayed double deletion strategyThe pseudo code is as follows

public void write(String key,Object data){
        redis.delKey(key);
        db.updateData(data);
        Thread.sleep(1000);
        redis.delKey(key);
    }

Translated into Chinese description is

  • (1) Eliminate cache first
  • (2) Modify the database again (these two steps are the same as before)
  • (3) Hibernate for 1 second and eliminate the cache again. In this way, the dirty cache data caused within 1 second can be deleted again.

So, how do you determine this one second, and how long should you sleep?

In view of the above situation, readers should evaluate the time-consuming of reading data business logic of their own projects. Then, the sleep time of writing data can be increased by hundreds of MS based on the time-consuming of reading data business logic. The purpose of this is to ensure that the read request ends and the write request can delete the dirty cache data caused by the read request.

What if you use MySQL’s read-write separation architecture?

OK, in this case, the data inconsistency will also be caused. The reasons for the data inconsistency are as follows: one request a for update and the other request B for query.

  • (1) Cache delete request, a
  • (2) Request a writes the data to the database,
  • (3) Request B queries the cache and finds that the cache has no value
  • (4) Request B to query from the database. At this time, the master-slave synchronization has not been completed, so the old value is queried
  • (5) Request B to write the old value to the cache
  • (6) The database completes master-slave synchronization and changes from the database to a new value. The above situation is the reason for data inconsistency.

The solution is to use the delayed double deletion strategy. However, the sleep time is modified to add hundreds of MS based on the delay time of master-slave synchronization.

With this synchronous elimination strategy, what should we do to reduce the throughput?

ok,Then treat the second deletion as asynchronous。 Start a thread and delete it asynchronously. In this way, the written request does not need to go back after sleeping for a period of time. Do this to increase throughput.

The second deletion. What if the deletion fails?

This is a very good problem. If the cache deletion fails for the second time, there will be inconsistency between the cache and the database, and dirty data will always exist. How to solve it?
The second deletion operation can be put into the queue for two reasons. First, the queue can meet the asynchronous requirements. Second, the queue has a retry mechanism to ensure the success of the final deletion.

(3) Update the database before deleting the cache

First, let’s talk about it first. Foreigners have proposed a cache update routine called《Cache-Aside pattern》。 It is pointed out that

  1. invalid: the application program fetches data from the cache first. If it does not get the data, it fetches the data from the database and puts it into the cache after success.
  2. Hit: the application fetches data from the cache and returns it.
  3. to update: first save the data to the database, and then invalidate the cache after success.

In addition, Facebook, a well-known social networking site, is also writing a paper《Scaling Memcache at Facebook》They also use the strategy of updating the database first and then deleting the cache.

Is there no concurrency problem in this case?

no, it isn’t. There must be concurrency problems. There are the following two cases

1. The database is a read-write separation architecture
In this case, data inconsistency will also be caused. The reasons for data inconsistency are as follows: one request a for update and the other request B for query.

(1) Request a to write and update data
(2) Request a to delete the cache
(3) Request B queries the cache and finds that the cache has no value
(4) Request B to query from the database. At this time, the master-slave synchronization has not been completed, so the old value is queried
(5) Request B to write the old value to the cache
(6) The database completes master-slave synchronization and changes from the database to a new value. The above situation is the reason for data inconsistency.

Solution: like the above method, useAsynchronous delayed double write strategy, after a requests to delete the cache, a clear cache is performed after a short delay.

2. Extreme cases
Assuming that there are two requests, one request a for query operation and one request B for update operation, the following situations will occur

(1) The cache just expired

(2) Request a to query the database and get an old value

(3) Request B to write the new value to the database

(4) Request B to delete cache

(5) Request a to write the found old value into the cache OK. If the above occurs, dirty data will indeed occur.

However, what is the probability of this happening?

There is a congenital condition that the execution speed of the write database operation in step (3) and the delete cache operation in step (4) is greater than that in step (5).

We know that writing to the database must take longer than reading to the database. Moreover, step (5) is to operate in the cache. In this extreme case, there is only one possibility that request a is blocked due to network problems when writing to the cache, resulting in steps 3 and 4 being executed first.

How to solve the above concurrency problem?

First, setting an effective time for the cache is a solution. Secondly, the method given in strategy (2) is adoptedAsynchronous delayed deletion policy, make sure to delete after the read request is completed.

What if delayed deletion fails

The problem of delayed deletion failure in both cache update strategy (2) and cache update strategy (3) has been described above. An asynchronous queue can be used. It is described in detail below

How to solve it?Provide a guaranteed retry mechanism. Here are two schemes.

Scheme I: as shown in the figure below

How to ensure double write consistency between cache and data

The process is as follows

  • (1) Update database data;
  • (2) Cache deletion failed due to various problems
  • (3) Send the key to be deleted to the message queue
  • (4) Consume the message yourself and get the key to be deleted
  • (5) Continue to retry the delete operation until it succeeds. However, this scheme has a disadvantage of causing a large number of intrusions into the line of business code. So there is scheme 2. In scheme 2, start a subscription program to subscribe to the binlog of the database to obtain the data to be operated. In the application, start another program to get the information from the subscriber and delete the cache.Scheme II
How to ensure double write consistency between cache and data

The process is shown in the figure below:

  • (1) Update database data
  • (2) The database will write the operation information to the binlog log
  • (3) The subscriber extracts the required data and key
  • (4) Start another non business code to obtain this information
  • (5) An attempt was made to delete the cache and the deletion failed
  • (6) Send this information to the message queue
  • (7) Retrieve the data from the message queue and retry the operation.

Remarks:The above binlog subscription program has a ready-made middleware called canal in mysql, which can complete the function of subscribing to binlog logs. As for Oracle, bloggers do not know whether there is ready-made middleware to use. In addition, for the retry mechanism, bloggers use message queue. If the requirements for consistency are not very high, you can directly start another thread in the program and try again every other period of time. These people can play flexibly and freely, just provide an idea.

summary

For the first schemeUpdate the cache after updating the databaseAnd third programmeDelete cache after updating databaseFor comparison, which scheme should be used
The essence of this problem isWhether the updated data will be accessed frequently, and whether it is hot data.
If the data is hot data and the third scheme is usedDelete cache after updating database, this willWhen querying data, all the databases hit because the cache was not queried, resulting in redis cache avalancheTherefore, if it is hot data, the first scheme needs to be usedUpdate cache after updating data, this scheme can be suitable for large concurrent scenarios.
If the data is not hot data, we can use the third schemeDelete cache after updating database, this scheme can avoid wasting performance by writing to the cache multiple times.
If you are a business requirement with more database writing scenarios and less data reading scenarios, this scheme is adoptedUpdate cache after updating dataThis will cause the cache to be updated frequently before the data is read at all, wasting performance.

The first schemeUpdate the cache after updating the databaseIt is suitable for scenarios with high concurrency and hot data.
Third schemeDelete cache after updating databaseIt is suitable for ordinary scenarios with low concurrency.

Adding expiration time to the cache is the simplest way to solve the problem of data consistency, which ensures thatFinal data consistency。 The key to this method is how to set the expiration time. If the time setting is too long, dirty data will be retained for too long. If the time setting is too short, it will cause frequent write to the cache and waste performance.

reference resources:
https://zhuanlan.zhihu.com/p/59167071, this article is based on the summary and revision of this article.