Explain redis cache penetration / breakdown / avalanche principle and its solution in detail

Time:2022-5-8

1. Introduction

As shown in the figure, a normal request

1. The client requests Zhang Tieniu’s blog.

2. The service will first request redis to check whether the requested content exists.

3. Redis returns the request result to the service. If the returned result has data, it will be executed7; If there is no data, the execution will continue.

4. The service queries the requested data from the database.

5. The database returns the query results to the service.

6. If the database has returned data, add the returned results to redis.

7. Return the requested data to the client.

2. Cache penetration

2.1 description

Access data that does not exist in the cache or database through the interface.

Because the service is for fault tolerance, when the request cannot find data from the persistence layer, it will not be written to the cache, which will cause the non-existent data to be queried in the persistence layer every time, losing the significance of the cache.

At this time, the cache can not protect the back-end persistence layer, as if it had been penetrated. There is a risk that the database will be hung up.

2.2 solutions

1. Verification of interface request parameters. Authenticate the requested interface and verify the validity of data; For example, the userid of the query cannot be negative or contain illegal characters.

2. When the database returns a null value, cache the null value to redis and set a reasonable expiration time.

3.Bloom filter。 The bloom filter is used to store all the keys that may be accessed. The nonexistent keys are directly filtered, and the existing keys are further queried in the cache and database.

3. Buffer breakdown

3.1 description

For a hot key, when the cache expires, a large number of requests come in at the same time. Because the cache expires at this time, the requests will eventually go to the database, resulting in a large number of instantaneous database requests and a sudden increase in pressure, resulting in the risk of hanging up the database.

3.2 solutions

1. Add mutex lock. When the hot key expires and a large number of requests flood in, only the first request can obtain the lock and block. At this time, the request queries the database, writes the query result to redis and releases the lock. Subsequent requests go directly to the cache.

2. Set the cache not to expire or there are threads in the background to renew the hot data all the time.

4. Cache avalanche

4.1 description

A large number of hot data have the same expiration time, resulting in the collective failure of data at the same time. It causes a large amount of instantaneous database requests, a sudden increase in pressure, and an avalanche, resulting in the risk of hanging up the database.

4.1 solutions

1.Break up the expiration time of hotspot data。 Add a random value when setting the expiration time for hotspot data.

2. Add mutex lock. When the hot key expires and a large number of requests flood in, only the first request can obtain the lock and block. At this time, the request queries the database, writes the query result to redis and releases the lock. Subsequent requests go directly to the cache.

3. Set the cache not to expire or there are threads in the background to renew the hot data all the time.

5. Bloom filter

5.1 description

Bloom filter is one of the schemes to prevent cache penetration. Bloom filter is mainly used to solve business scenarios that do not need accurate filtering under large-scale data, such as checking spam addresses, removing duplicate crawler URL addresses, and solving cache penetration problems.

Bloom filter:Filter a corresponding element in a certain number of sets to determine whether the element must not be in the set or may be in the set。 itsadvantageThe spatial efficiency and query time are much better than ordinary algorithms,shortcomingThere is a certain error recognition rate and deletion difficulty.

5.2 data structure

Bloom filter is based onbitmapAnd severalHash algorithmImplemented. As shown in the figure below:

1. Elementstieafterhash1,hash2,hash3The corresponding three values are calculated and fall into the array subscript4,6,8And set its position to the default value0, amend to1

2. ElementsniuSimilarly, the subscript of the array is1,3,4And set its position to the default value0, amend to1

herebitmapAlready stored intieniuData element.

When the request wants to be judged through the bloom filtertieWhen the element exists in the program, thehashThe values found from the operation result to the corresponding subscript position of the array have been set to1, return totrue

5.3 “must not be in the set”

As shown in the figure:

ElementzhangWhen judged by Bloom filter, subscript0,2All for0, return directlyfalse

That is not to judgebitmapWhen the element inHash operationThe results obtained are inbitmapAs long as one of them is0, the data must not exist.

5.4 “possible in set”

As shown in the figure:

ElementshuaibiWhen judged by Bloom filter,Hash operationThe result fell to the subscript1,3,8At this time, the values corresponding to subscript positions are1, return directlytrue

This is embarrassing because there is no data in the actual programshuaibi, but the result returned by the bloom filter shows this element. This is the disadvantage of Bloom filter, which has misjudgment.

Difficulty in deleting “5.5”“

Why is it difficult to delete the bloom filter, as shown in the figure:

If the “tie” element is deleted,4The number is set to0, will affectniuElement judgment, because4The number is0, returned during data verification0, it will be considered that there is no procedureniuElement.

The little partner will ask, position 4 is not set to 0, OK?

If the element is deleted and the array subscript of hash collision is not set to 0, the bloom filter will continue to return true if the element continues to be verified, but in fact, the element has been deleted.

Therefore, it is difficult to delete the data of Bloom filter. If you want to delete it, you can refer toCounting Bloom Filter

5.6 why not use HashMap?

If you use HashSet or HashMap to store, each user ID should be saved as int, accounting for 4 bytes, that is, 32bit. A user only needs one bit in bitmap, which saves 32 times of memory.

Moreover, a large amount of data will produce a large number of hash conflicts. As a result, the data with hash conflicts will still be compared one by one (even if it is converted into a red black tree). In this way, the improvement of memory space and query efficiency is still limited.

Of course: when the amount of data is small, just use it. Moreover, HashMap is convenient for crud

This is the end of this article about the detailed explanation of cache penetration / breakdown / avalanche principle and its solutions. For more relevant cache penetration / breakdown / avalanche content, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!