The cache architecture is not good enough, the system is easy to crash

Time:2020-11-23

Soul torture

  • Cache can greatly improve the performance of the system and the probability of system paralysis
  • How to prevent the cache system from being penetrated?
  • Can cache avalanche be completely avoided?

In the past few articles, we introduced the advantages of caching and the problem of data consistency. In a high concurrency system, caching has become almost every architect’s first flush solution to deal with high traffic. However, a good cache system, in addition to the database consistency problem, also has other problems, which brings extra complexity to the overall system design. The solutions to these complex problems also directly affect the stability of the system. The most common problem is the cache hit rate. In a high concurrency system, the cache hit rate of the core functions should be kept above 90% or even higher. If the hit rate is lower than this, the whole system may be destroyed by the peak traffic at any time. At this time, we need to It’s time to optimize the use of the cache.

I hear you don’t know how to cache?

This paper discusses the consistency of cache data thousands of times

If you follow the traditional cache and DB process, when a request arrives, you will first query whether the cache exists, if not, query the corresponding database. If the system has 10000 requests per second and the cache hit rate is 60%, the number of requests penetrating into the database per second is 4000. For the relational database mysql, the number of requests per second of 4000 is large enough for the MySQL database architecture with one master and three slaves. In addition, the synchronization delay of master-slave and other factors will make your MySQL go down It’s on the edge.

The ultimate goal of caching is to improve the throughput of the system as much as possible under the condition of ensuring low latency of requests

What are the causes of cache system crash?

Cache penetration

Cache penetration refers to that when a request arrives, the corresponding data is not found in the cache (cache miss), and the business system has to load data from the database (which can be generally referred to as the back-end system)

The cache architecture is not good enough, the system is easy to crash

The causes of cache penetration can be divided into two types according to the scenarios:

The requested data does not exist in the cache or in the data

When the data does not exist in the cache and database, if the general cache design is followed, each request will query the database once, and then return that it does not exist. In this scenario, the cache system hardly plays any role. In the normal business system, the probability of this situation is relatively small, even if it happens occasionally, it will not cause fundamental pressure on the database.

The most terrible thing is that there are some abnormal situations, such as dead loop queries in the system or when hackers attack, especially the latter. He will deliberately forge a large number of requests to read non-existent data, resulting in the database down machine. The most typical scenario is: if the system’s user ID is continuously increasing int type, it is easy for hackers to forge user IDs to simulate a large number of requests.

The requested data does not exist in the cache, it exists in the database

This scenario generally belongs to the normal needs of the business, because the capacity of the cache system is generally limited. For example, redis, which is the most commonly used cache, is limited by the memory size of the server. Therefore, it is impossible for all business data to be put into the cache system. According to the 28 rules of Internet data, we can put the most frequently accessed hot data into the cache system In this way, we can use the advantages of cache to resist the main traffic sources, and the remaining non hot data, even if it is possible to penetrate the database, will not cause fatal pressure on the database.

In other words, cache penetration is inevitable in every system, and what we need to do is to avoid a large number of requests. How to solve the problem of cache penetration? To solve the problem of cache penetration is essentially to solve the problem of how to intercept requests. Generally, there are several solutions:

Write back null value

When the requested data does not exist in the database, the cache system can write the corresponding key to a null value, so that the next time the same request does not directly penetrate the database, but directly returns the null value in the cache. This scheme is the most simple and crude, but we should pay attention to the following points:

  • When a large number of null values are written to the cache system, they will also occupy memory, but in theory, it will not be too much, which depends on the number of keys. Moreover, according to the cache elimination strategy, normal data cache items may be eliminated
  • The expiration time of null value should be shorter. For example, the expiration time of normal data cache may be 2 hours, and the expiration time of null value can be considered as 10 minutes. This is to release the memory space of the server as soon as possible. Second, if the business generates corresponding real data, it can make the null value of cache fail quickly and make the cache and database consistent as soon as possible.
//Get user information
        public static UserInfo GetUserInfo(int userId)
        {
            //Read user information from cache
            var userInfo = GetUserInfoFromCache(userId);
            if (userInfo == null)
            {
                //Write back null values to the cache and set the cache expiration time to 10 minutes
                CacheSystem.Set(userId, null,10);
            }

            return userInfo;
        }
Bloon filter

Bloon filter: hash all possible data into a big enough bitmap, and a data that must not exist will be intercepted by this bitmap, thus avoiding the query pressure on the underlying storage system

There are several big advantages to a bloon filter

  • It takes up very little memory
  • For judging a data does not exist 100% correct

For details, please refer to the previous article or Baidu brain to fill the bloon filter:

Elegant and fast statistics of 10 million levels of UV

Since the bloom filter is based on hash algorithm, its time complexity is O (1), which is very suitable for dealing with high concurrency scenarios. However, the use of Bloom filter requires the system to write data to the bloom filter at the same time when generating data, and the bloon filter does not support deleting data, because multiple data may reuse the same location.

The cache architecture is not good enough, the system is easy to crash

Cache avalanche

Cache avalanche means that a large number of data in the cache are expired at the same time, resulting in a huge amount of query database data, resulting in excessive database pressure and system crash.

Different from the phenomenon of cache penetration, cache penetration refers to the fact that there is no data in the cache, resulting in a large number of queries on the database, while cache avalanche is caused by the existence of data in the cache, but a large number of expired data at the same time. But it’s the same in essence, all of which cause a lot of requests to the database.

Both penetration and avalanche are faced with the consistency problem that multiple threads request the same data at the same time, query the database and write back the cache at the same time. For example, when multiple threads request a user with user ID 1 at the same time, the cache just fails. Then these multiple threads will query the database at the same time, and then write back the cache at the same time. The most terrible thing is that during this write back process, another thread updates the database, resulting in data inconsistency. This problem has been emphasized in the previous article. You must Pay attention.

The same data will be generated by multiple threads, which is one of the reasons for avalanche. The solution to this situation is to sequence the requests of multiple threads, so that only one thread will generate query operations on the database, such as the most common locking mechanism (distributed locking mechanism). Now, the most common distributed lock is implemented by redis, but redis implements distribution Type lock also has some holes. Please refer to the previous article (if the actor model is used, it will be more elegant to realize the request sequencing in the lock free mode)

Redis may not be so easy to do distributed locking

The scenario that multiple cache keys fail at the same time is the main reason for avalanche. For such scenarios, the following solutions can be used

Set different expiration times

Setting a different expiration time for each key in the cache is the simplest way to prevent cache avalanche. The overall idea is to add a random value to the expiration time set by the system for each cache key, or simply random a value directly, so as to effectively balance the batch expiration period of keys and eliminate the peak number of expired keys between units.

public static int SetUserInfo(int userId)
        {
            //Read user information
            var userInfo = GetUserInfoFromDB(userId);
            if (userInfo != null)
            {
                //Write back to the cache and set the cache expiration time to random time
                var cacheExpire = new Random().Next(1, 100);
                CacheSystem.Set(userId, userInfo, cacheExpire);
                return cacheExpire;
            }

            return 0;
        }
Background single thread update

In this scenario, the cache can be set to never expire, and the cache update is not done by the business thread, but by a dedicated thread. When the cache key is updated, the business sends a message to MQ. The thread updating the cache will listen to the MQ to respond in real time, so as to update the corresponding data in the cache. However, this method should consider the scenario of cache obsolescence. After a cached key is eliminated, a message can also be sent to MQ to achieve the operation of updating the thread to write back the key again.

Availability and scalability of cache

Like database, the design of cache system also needs to consider high availability and scalability. Although the performance of the cache system itself has been relatively high, but for some special high concurrency hot data, it will still encounter the bottleneck of a single machine. Let’s take a chestnut: if a star is cheating, this information data will be cached on a cache server node, and a large number of requests will arrive at the server node. When it reaches a certain degree, the machine will also be down. Similar to the master-slave architecture of the database, the cache system can also replicate multiple cache copies to other servers, so that the application requests can be distributed to multiple cache servers to alleviate the single point problem caused by hot data.

Like the master-slave database, multiple copies of the cache are also faced with data consistency, synchronization delay, and the expiration time of the same key on the master-slave server.

As for the scalability of the cache system, we can also use the principle of “fragmentation” and use the consistent hash algorithm to route different requests to different cache server nodes to meet the requirements of horizontal expansion, which is the same as the principle of horizontal expansion of application.

Write it at the end

It can be seen from the above that whether it is the high availability architecture of the application server, the high availability architecture of the database, or the high availability of the cache, the truth is similar. When we master one of them, it can be easily extended to any scenario. If this article is helpful to you, please share it with your friends. At last, you are welcome to write down your other solutions about cache high availability, scalability, and prevention of penetration and avalanche. Let’s make progress together!!

More wonderful articles

The cache architecture is not good enough, the system is easy to crash

Recommended Today

MySQL JSON function

catalog 12.17.1 JSON function reference 12.17.2 functions for creating JSON values 12.17.3 functions for searching JSON values 12.17.4 functions for modifying JSON values 12.17.5 functions that return JSON value properties 12.17.6 function of JSON utility 12.17.1 JSON function reference Table 12.21 JSON functions Name describe -> Returns a value from the JSON column after evaluating […]