The cache architecture is not good enough, and the system is easily paralyzed


Soul torture

  • Caching can greatly improve the system performance and the probability of system paralysis
  • How to prevent the cache system from being penetrated?
  • Can cache avalanche be completely avoided?

In previous articles, we introduced the advantages of cache and the problem of data consistency. In a high concurrency system, cache has become the first solution for every architect to deal with high traffic. However, a good cache system has other problems besides the consistency with database, which brings extra complexity to the overall system design. The solutions to these complex problems also directly affect the stability of the system. The most common problems are cache hit rate. In a high concurrency system, the cache hit rate of core functions should generally be maintained at more than 90% or even higher. If the hit rate is lower than this, the whole system may face the possibility of being destroyed by peak traffic at any time. At this time, we need to solve the problem It’s time to optimize how the cache is used.

I hear you can’t cache yet?

On the consistency of thousands of times of cache data

According to the traditional cache and DB process, when a request arrives, it will first query whether it exists in the cache. If it does not exist in the cache, it will query the corresponding database. If the number of requests per second of the system is 10000 and the cache hit rate is 60%, then the number of requests per second penetrating into the database is 4000. For the relational database mysql, the number of requests per second of 4000 is large enough for the MySQL database architecture with one master and three slaves. Coupled with the synchronization delay of master and slave and many other factors, at this time your MySQL has been running down It’s on the edge of the machine.

The ultimate goal of caching is to maximize the throughput of the system while ensuring low latency of requests

What are the reasons that cache system may affect system crash?

Cache penetration

Cache penetration means that when a request arrives, the corresponding data is not found in the cache (cache miss), and the business system has to load data from the database (in this case, it can be generally called the back-end system)

The cache architecture is not good enough, and the system is easily paralyzed

There are two reasons for cache penetration according to the scene

The requested data does not exist in the cache or in the data

When the data does not exist in both the cache and the database, according to the general cache design, every request will query the database once, and then return to nonexistence. In this scenario, the cache system has almost no effect. In the normal business system, the probability of this situation is relatively small, even if it happens occasionally, it will not cause fundamental pressure on the database.

The most terrible thing is that there are some abnormal situations, such as dead loop queries in the system or being attacked by hackers, especially the latter. He wilfully forges a large number of requests to read nonexistent data, resulting in the down machine of the database. The most typical scenario is: if the user ID of the system is continuously increasing int type, it is easy for hackers to forge user ID to simulate a large number of requests.

The requested data does not exist in the cache, but exists in the database

This kind of scenario generally belongs to the normal needs of the business, because the capacity of the cache system is generally limited. For example, the most commonly used redis as the cache is limited by the size of the server memory, so it is impossible for all business data to be put into the cache system. According to the 28 rule of Internet data, we can give priority to putting the most frequently accessed hot data into the cache In this way, we can use the advantage of cache to resist the main traffic sources, and the remaining non hot data, even if it has the possibility of penetrating the database, will not cause fatal pressure on the database.

In other words, cache penetration is inevitable in every system, and what we need to do is try to avoid a large number of requests. How to solve the problem of cache penetration? To solve the problem of cache penetration is essentially to solve the problem of how to intercept requests. Generally, there are the following solutions:

Write back null value

When the requested data does not exist in the database, the cache system can write the corresponding key to a null value, so that the next time the same request will not directly penetrate the database, but directly return the null value in the cache. This kind of scheme is the most simple and crude, but we should pay attention to the following points:

  • When a large number of null values are written to the cache system, it will also occupy memory, but not too much in theory, which depends entirely on the number of keys. Moreover, according to the cache elimination strategy, normal data cache items may be eliminated
  • The expiration time of null value should be shorter. For example, the normal data cache expiration time may be 2 hours. The expiration time of null value can be considered to be 10 minutes. One is to release the server’s memory space as soon as possible. The other is to make the cache null value fail quickly if the business generates corresponding real data, so as to make the cache consistent with the database as soon as possible.
//Get user information
        public static UserInfo GetUserInfo(int userId)
            //Read user information from cache
            var userInfo = GetUserInfoFromCache(userId);
            if (userInfo == null)
                //Write back the null value to the cache and set the cache expiration time to 10 minutes
                CacheSystem.Set(userId, null,10);

            return userInfo;
Bloom filter

Bloom filter: hash all possible data into a big enough bitmap, and a certain nonexistent data will be intercepted by the bitmap, so as to avoid the query pressure on the underlying storage system

Bloom filters have several big advantages

  • It takes up very little memory
  • It is 100% correct to judge that a data does not exist

For details, please refer to the previous article or Baidu brain to fill the bloom filter:

Elegant and fast statistical level UV

Because bloom filter is based on hash algorithm, its time complexity is O (1), which is very suitable for high concurrency scenarios. However, when using Bloom filter, the system needs to write data to bloom filter at the same time when generating data, and bloom filter does not support deleting data, because multiple data may reuse the same location.

The cache architecture is not good enough, and the system is easily paralyzed

Cache avalanche

Cache avalanche means that a large number of data in the cache are expired at the same time, resulting in a huge amount of query database data, resulting in excessive database pressure and system crash.

Different from cache penetration, cache penetration means that there is no data in the cache, which will cause a large number of queries to the database, while cache avalanche is caused by the existence of data in the cache, but at the same time a large number of expired data. But it’s the same in essence. It’s all caused a lot of requests to the database.

Both penetration and avalanche are faced with the same data will have multiple threads request at the same time, query the database at the same time, and write back the cache at the same time. For example, when multiple threads request the user with user ID 1 at the same time, the cache just fails. These threads will query the database at the same time, and then write back the cache at the same time. The most terrible thing is that in the process of writing back, another thread updates the database, resulting in data inconsistency. This problem has been emphasized in the previous article, and we must pay attention to it Pay attention.

The same data will be generated by multiple threads and multiple requests is one of the reasons for avalanche. The solution to this situation is to make the requests of multiple threads sequential so that only one thread can query the database, such as the most common lock mechanism (distributed lock mechanism). Now the most common distributed lock is implemented by redis, but redis implements distributed lock There are also some holes in the lock mode. Please refer to the previous article (if you use the actor model, you will realize the request sequencing more elegantly in the lock free mode)

Redis may not be so easy to do distributed locking

The main reason of avalanche is that multiple cache keys fail at the same time. For such a scenario, the following solutions can be used to solve the problem

Set different expiration time

Setting different expiration time for each cache key is the simplest way to prevent cache avalanche. The overall idea is to add a random value to each cache key on top of the expiration time set by the system, or simply a random value directly, so as to effectively balance the batch expiration time of keys and eliminate the peak number of expired keys between units.

public static int SetUserInfo(int userId)
            //Read user information
            var userInfo = GetUserInfoFromDB(userId);
            if (userInfo != null)
                //Write back to the cache and set the cache expiration time to random time
                var cacheExpire = new Random().Next(1, 100);
                CacheSystem.Set(userId, userInfo, cacheExpire);
                return cacheExpire;

            return 0;
Background individual thread update

In this scenario, the cache can be set to never expire, and the cache update is not by the business thread, but by a special thread. When the cache key is updated, the business sends a message to MQ, and the thread updating the cache will listen to the MQ to respond in real time so as to update the corresponding data in the cache. However, this method needs to consider the scenario of cache elimination. When a cached key is eliminated, a message can also be sent to MQ to update the thread and write back the key again.

Availability and scalability of cache

Just like database, the design of cache system also needs to consider high availability and scalability. Although the performance of the cache system itself has been relatively high, but for some special high concurrency hot data, it will still encounter the bottleneck of a single machine. Take a chestnut for example: if a star derails, the information data will be cached on the node of a cache server, and a large number of requests will arrive at the server node. When it reaches a certain level, the machine will also be down. Similar to the master-slave architecture of database, the cache system can also copy multiple cache copies to other servers, so that the application requests can be distributed to multiple cache servers to alleviate the single point problem caused by hot data.

Like the master-slave database, multiple copies of the cache also face the problems of data consistency, synchronization delay, and the expiration time of the same key on the master-slave server.

As for the scalability of the cache system, we can also use the principle of “fragmentation” and use the consistent hash algorithm to route different requests to different cache server nodes to meet the requirements of horizontal expansion, which is the same as the horizontal expansion of the application.

Write at the end

As can be seen from the above, whether it is the high availability architecture of application server, database or cache, the truth is similar. When we master one of them, we can easily extend it to any scenario. If this article is helpful to you, please share it with your friends. Finally, you are welcome to leave a message and write down other solutions about cache, high availability, scalability, and preventing penetration and avalanche that you use in daily development. Let’s make progress together!!

More wonderful articles

The cache architecture is not good enough, and the system is easily paralyzed

Recommended Today

Third party calls wechat payment interface

Step one: preparation 1. Wechat payment interface can only be called if the developer qualification has been authenticated on wechat open platform, so the first thing is to authenticate. It’s very simple, but wechat will charge 300 yuan for audit 2. Set payment directory Login wechat payment merchant platform( pay.weixin.qq . com) — > Product […]