When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask

Time:2021-2-23

The interview begins

Hello, young man. Look at the MySQL and redis on your resume. Let’s focus on the two of them today. Redis and MySQL are important roles in back-end development. In the actual development, both of them are basically like shadow. In order to improve performance and response, redis often stores hot data and MySQL stores all data to ensure data persistence. So redis is part of MySQL.

So the problem is, when the persistent data in MySQL changes, how to notify redis? In other words, how to ensure the double write consistency of cache and database data?


Hello, interviewer, the scheme we adopted in the development is as follows:First update the database, and then delete the corresponding cache until no data is found in the next request cache. Then read the data from MySQL and update the data to redis

So why delete cache instead of update cache?


As shown in the figure below, if you update the cache, you mayRequest a occurs earlier than request B, and the update cache should be earlier than request B. However, due to network and other reasons, B updates the cache earlier than request a, which leads to dirty data
When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask
Secondly, if you are a business with more scenarios of writing database and less scenarios of reading data, it will be difficult to adopt the scheme of updating cacheAs a result, the data has not been read at all, but the cache has been updated frequently, wasting performance.

What’s wrong with deleting the cache first and then updating the database?


As shown in the figure below, request a to write and delete the cache, request B to query and find that the cache does not exist, then go to the database to query and get the old value, then write the old value to the cache, and then request a to write the new value to the database.
This can lead toInconsistent dataThis is the case. Moreover, if we do not adopt the policy of setting the expiration time for the cache, the cache data will always be dirty.
When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask

OK, the solutions just mentioned do have problems in concurrent environments. Then you use to update the database, and then delete the cache, this solution will not have concurrent problems?


The answer isNot necessarily, there may be concurrency problems. As shown in the figure below,
When the operation of updating the database in step 3 takes less time than the operation of reading the database in step 2, it is possible to make step 4 prior to step 5, and the cache is dirty data. But in generalThe speed of database read operation is much faster than that of write operation (as can be seen from the amount of concurrent read and write in mysql, the efficiency of concurrent read is several times that of concurrent write under the same hardware configuration)
When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask

Therefore, if you want to implement the basic cache and database double write consistent logic, then in most cases, if you don’t want to do too much design or add too much workload, pleaseFirst update the database, then delete the cache!

First update the database, and then delete the cache. Besides the problem you just mentioned, will there be any other problems?


IfMySQL adopts a read-write separation architectureWhen a is requested to update the data in the master database and delete the cache, the master-slave synchronization of the database is not completed. After the cache miss occurs in the query cache of request B, the old value read from the slave library will also cause data inconsistency.
When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask

You just said that updating the database before deleting the cache may also cause data inconsistency. How to solve this problem?


useDelayed double deletion. As shown in the figure below, after request a updates the database, in order to prevent the cache from being written to the old value in request B, you can sleep for a while after request a updates the database (for example, 100ms, 200ms, according to the actual business scenario), and then delete the cache. This can basically ensure that the cache will not be dirty data. This is also the principle of master-slave architecture, that is, request a not to delete the cache immediately after updating the masterDelay double deletion ensures that master-slave synchronization is completedFinally, delete the cached data.
When an interviewer asks you how to ensure the consistency of cache and database double write, what exactly does he want to ask

But if you ask a to sleep for a period of time, it may affect the RT of the interface and reduce the throughput of the system. How to solve this problem?


The more elegant solution here is throughasynchronousrealization. That is, start a thread pool, start a separate thread when requesting a, sleep asynchronously for a period of time, and then execute cache deletion. Of course, you can also throw the corresponding key in the cache to the message queue and delete it asynchronously through MQ. However, just to delete the cache asynchronously, an additional layer of message queue is added, which may cause more complex system design and other problems.

It has been mentioned to delete the cache before. What should I do if the deletion fails?


One moreRetrial mechanismTo ensure that the cache is deleted successfully.

What if I have to make the database and cache data consistent?


There is no way to achieve absolute consistency, which is determined by the cap theory. The scenario that the cache system is suitable for is a non strong consistency scenario, so it belongs to the AP in the cap.

Cap theory is a classic theory in distributed system, namely consistency, availability and partition tolerance.

According to the theory of base (basically available, soft state and eventualy consistent), cache and database can only store dataFinal consistency

The interview is over

It’s not too early. Let’s call it a day. We can see that the young man has a deep grasp of this area. Our company lacks talents like you. Why don’t we sign the offer now.
At this time, you must be willing to accept the offer with one hand and wave your hand: No, no, Shenzhen horse is also waiting for my reply, which has urged me for several days.
Interviewer a listen, pay roll group where, increase!

Summary

It’s not a very simple thing to use cache, especially in the scenario where cache and database are required to maintain strong consistency, we know that keeping database data and cache data consistent is a very profound knowledge.
From the ancient hardware cache, operating system cache, cache is a unique knowledge. This issue has also been discussed by the industry for a long time, and the debate has been fruitless so far, because it is actually a matter of trade-off.
I’m young Xia Lufei. I love technology and sharing.