Redis series (VI) redis’s cache penetration, cache breakdown and cache avalanche

Time:2021-9-16

It is more or less used in NoSQL development, and it is also a necessary knowledge point for interview. I’ve asked every interview in recent days. However, I feel that the answer is not good, and there are still many knowledge points that need to be sorted out. Here, I’ll comb through several redis notes, and then add the above test questions.

Redis series:

  1. Redis series (I) introduction to redis
  2. Redis series (II) 8 data types of redis
  3. Redis series (III) integration of redis transactions and spring boot
  4. Redis series (IV) redis profile and persistence
  5. Redis series (V) publish subscribe mode, master-slave replication and sentinel mode
  6. Redis series (VI) redis’s cache penetration, cache breakdown and cache avalanche
  7. Redis series (VII) redis interview questions
  8. Redis Command Reference

1. Possible problems with redis

The use of redis cache greatly improves the performance and efficiency of applications, especially in data query. But at the same time, it also brings some problems. Among them, the most crucial problem is the problem of data consistency. Strictly speaking, this problem has no solution. If the requirements for data consistency are very high, the cache cannot be used for a long time.

Other typical problems are cache penetration, cache avalanche and cache breakdown. At present, there are more popular solutions in the industry.

2. Cache penetration (caused by data not found)

The concept of cache penetration is simple. Users want to query a data and find that there is no redis memory database, that is, the cache misses. Then query the persistence layer database. No, so this query failed. When there are many users, the cache misses, so they all request the persistence layer database. This puts a lot of pressure on the persistence layer database, which is equivalent to cache penetration.

Solution:

1. Bloom filter

Bloom filter is a data structure that stores all possible query parameters in the form of hash. It is verified in the control layer first, and discarded if it does not meet the requirements, so as to avoid the query pressure on the underlying storage system.

2. Cache empty objects

When the storage layer fails to hit, even the returned empty object will be cached. The synchronization will synchronize an expiration time, and then access the data will be obtained from the storage to protect the back-end data source.

However, there are two problems with this method:

1. If the control can be cached, it means that the cache needs more storage space, because there may be many null keys;

2. Even if the expiration time is set for a null value, there will still be inconsistency between the data of the cache layer and the storage layer for a period of time, which will have an impact on the business that needs to maintain consistency.

2. Cache breakdown (too many requests, cache expired)

Note the difference between and cache penetration. Cache breakdown means that a key is very hot and is constantly carrying large concurrency. Large concurrency focuses on accessing this point. When the key fails, the continuous large concurrency breaks through the cache and directly requests the database, which is like cutting a hole in a barrier.

When a key expires, a large number of requests are accessed concurrently. This kind of data is generally hot data. Because the cache expires, the database will be accessed at the same time to query the latest data and write back to the cache, resulting in excessive pressure on the database at the moment.

Solution:

1. Set hotspot data never to expire

From the perspective of the cache layer, the expiration time is not set, so there will be no problems after the hot key expires.

2. Add mutex

Distributed lock: using a distributed lock ensures that there is only one thread for each key to query the back-end service at the same time, and other threads do not have the permission to obtain the distributed lock, so they only need to wait. This method transfers the pressure of high concurrency to distributed locks, because it has a great test on distributed locks.

3. Cache avalanche

Cache avalanche means that the cache set expires in a certain period of time.

One of the reasons for the avalanche, for example, there will be a wave of rush buying soon. This wave of goods will be put in the cache for one hour. Then at one o’clock in the morning, the cache of these goods will expire. The access and query of these commodities fall on the database, which will produce periodic pressure peaks. Therefore, all requests will reach the storage layer, and the call volume of the storage layer will increase sharply, resulting in the return of the storage layer.

In fact, during the concentration period, it is not very fatal. The more fatal cache avalanche is the downtime or network disconnection of a node of the cache server. Because the cache avalanche formed naturally must be created in a certain period of time, and it can withstand the pressure at this time. It is nothing more than periodic pressure on the database. The downtime of the cache service node is unpredictable for the pressure on the database server, which is likely to crush the database in an instant.

Solution

1. Redis high availability

The meaning of this idea is that since redis may hang up, I will add several more redis. After one is hung up, others can continue to work. In fact, it is a cluster.

2. Current limiting degradation

The idea of this solution is to control the number of threads that write to the database cache by locking or queuing after the cache fails. For example, for a key, only one thread is allowed to query data and write cache, while other threads wait.

3. Data preheating

The meaning of data preheating is to access the possible data lines in advance before formal deployment, so that part of the data that may be accessed in large quantities will be loaded into the cache. Before a large concurrent access is about to occur, manually trigger the loading of different cache keys and set different expiration times to make the time point of cache invalidation as uniform as possible.

Recommended Today

A detailed explanation of the differences between Perl and strawberry Perl and ActivePerl

Perl is the abbreviation of practical extraction and report language “practical report extraction language”. Application of activestateperl and strawberry PERL on Windows platformcompiler。 Perl   The relationship between the latter two is that C language and Linux system have their own GCC. The biggest difference between activestate Perl and strawberry Perl is that strawberry Perl […]