What about cache penetration?

Time:2020-3-25

In the current Internet architecture, almost every internet project will introduce a caching system, such as redis and memcached. To protect the downstream database and improve the system concurrency. No matter what kind of caching system you use, you may encounterCache penetrationThe problem.

Cache penetration refers to the situation that no data is queried in the cache system, and the request has to be printed to the database for query.

Of course, the cache system is inevitable, and a small amount of cache penetration does not damage the system. The inevitable reasons are as follows:

  • The capacity of the cache system is limited, so it is impossible to store all the data of the system, so the cache penetration will occur when querying the uncached data.
  • On the other hand, based on the “28 principle”, we usually only cache the 20% of the commonly used hot data.

Under normal circumstances, cache penetration is harmless, but if your system is attacked and there is a large number of cache penetration, it may be a problem. If a large number of cache penetration exceeds the capacity of the back-end server, it may cause service crash, which is unacceptable.

Based on the possibility of such a large number of cache penetrations, we need to solve the problem of cache penetrations from the root. Currently, there are generally two solutions:Caching empty values and using the bloom filter

Cache null value

If our system is under attack, it is likely that the value of the query is forged, and the probability does not exist in our system, so no matter how many queries are made, it does not exist in the cache, so the cache penetration will always exist.

In this case, we can cache a null value in the cache system to prevent penetration, but because null value is not accurate business data, and it will occupy the cache space, we will add a shorter expiration time to the null value, so that the null value can quickly expire and be eliminated in a short time. Here is a pseudocode:

Object nullValue = new Object();
try {
  Object valuefromdb = getfromdb (uid); // query data from the database
  if (valueFromDB == null) {
    Cache.set (uid, nullvalue, 10); // if a null value is found in the database, write the null value to the cache and set a shorter timeout
  } else {
    cache.set(uid, valueFromDB, 1000);
  }
} catch(Exception e) {
  cache.set(uid, nullValue, 10);
}

Although this method can solve the problem of cache penetration, it also has disadvantages,Because a large number of empty values are stored in the cache system, which wastes the storage space of the cache. If the cache space is full, it will also eliminate some user information that has been cached, which will lead to the decrease of cache hit rate.

Using the bloon filter

In 1970, bloom proposed an algorithm of Bloom filter to determine whether an element is in a set. The bottom layer of the bloon filter is a super large bit array. The default value is 0. An element is mapped to the bit array through multiple hash functions, and 0 is changed to 1. Of course, we don’t need to implement the bloon filter. In Google’s guava package, bloon filter is provided, and interested partners can study it.

There are some misjudgments in the bloom filter, because there must be hash conflicts when using hash algorithm, which may cause elements not in the database to be judged to exist in the bloom filter, butElements that are not in the bloom filter must not exist in the database.

Using this feature of the bloom filter can solve the problem of cache penetration,When the service is started, first map the query conditions of data, such as the ID of data, to the bloon filter. Of course, when adding new data, in addition to writing to the database, you need to store the ID of data into the bloon filter

When querying a piece of data, we first determine whether the ID of the query exists in the bloom filter. If it does not exist, we will directly return a null value, instead of continuing to query the database and cache. If it exists in the bloom filter, we can continue to query the database and cache, which solves the problem of cache penetration.

What about cache penetration?

Of course, there are defects in the bloon filter. In addition to some misjudgments mentioned above, there is another oneDeletion not supported

Cache null value and the use of Bloom filter can solve the problem of cache penetration to a certain extent. Each has its own advantages. How to use it according to specific scenarios.

The above is the content shared today. I hope it will be helpful for your study or work. If you think the article is good, please give me a compliment and forward it. Thank you.

Last

At present, many big guys on the Internet have articles about cache penetration. If they are the same, please forgive them. It’s not easy to be original and code. I hope you can support me a lot. If there are any mistakes in the article, I hope to put forward them. Thank you.

What about cache penetration?