Redis cache penetration cache avalanche cache breakdown

Time:2022-8-4

After reading the interview materials today, I almost have to ask Redis to start reviewing~

Cache Penetration Cache Avalanche Cache Breakdown

1.1 What is cache penetration?

The data to be queried by the business system simply exists! When the business system initiates a query, according to the above process, it will first go to the cache to query, because it does not exist in the cache, and then go to the database to query. Since the data does not exist at all, the database also returns empty. This is cache penetration.

In summary:When a business system accesses data that does not exist at all, it is called cache penetration

1.2 Harm of cache penetration

If there are massive requests to query data that does not exist at all, then these massive requests will fall into the database, and the pressure on the database will increase sharply, which may cause the system to crash (you must know that in the current business systemThe most vulnerable is IO, it will collapse with a little pressure, so we have to think of ways to protect it).

1.3 Why does cache penetration occur?

There are many reasons for cache penetration, generally the following two:

vicious assault, Deliberately create a large amount of non-existent data to request our service, because the data does not exist in the cache, so a large number of requests fall in the database, which may cause the database to crash.

code logic error. This is the programmer's pot. There is nothing to talk about. It must be avoided in development!

1.4 Solutions for Cache Penetration

Here are two ways to prevent cache penetration.

1.4.1 Cache empty data

The reason why cache penetration occurs is that there is no key to store these empty data in the cache, which causes all these requests to hit the database.

Then, we can slightly modify the code of the business system, and store the empty key of the database query result in the cache. When there is a subsequent query request for the key, the cache directly returns null without querying the database.

There are two problems with caching empty objects:

First, the null value is cached, which means that more keys are stored in the cache layer, which requires more memory space (if it is an attack, the problem is more serious). The more effective method is to set ashorter expiration time, let it be automatically eliminated.

Second, the data of the cache layer and the storage layer will be inconsistent for a period of time, which may have a certain impact on the business. For example, the expiration time is set to 5 minutes. If the storage layer adds this data at this time, there will be data inconsistency between the cache layer and the storage layer during this period. At this time, the message system or other methods can be used to clear the empty space in the cache layer. object.

1.4.2 BloomFilter

The second way to avoid cache penetration is to use BloomFilter.

It needs to add a barrier before the cache, which stores all the keys currently existing in the database, as shown in the following figure:

When the business system has a query request, first go to BloomFilter to check whether the key exists. If it does not exist, it means that the data does not exist in the database, so do not check the cache, and return null directly. If it exists, continue to execute the subsequent process, first go to the cache to query, if there is no cache, then go to the database to query.

This method is suitable for application scenarios where the data hits are not high, the data is relatively fixed and real-time is low (usually the data set is large), the code maintenance is more complicated, but the cache space is less occupied.

1.4.3 Comparison of the two schemes

Both solutions can solve the problem of cache penetration, but the usage scenarios are different.

For some malicious attacks, the query keys are often different, and there are many data thieves. At this point, the first option seems to be stretched thin. Because it needs to store all the keys of empty data, and the keys of these malicious attacks are often different, and the same key is often only requested once. Therefore, even if the keys of these empty data are cached, they will not be able to protect the database because they are not used for the second time.
Therefore, for empty dataThe keys are different, and the probability of repeated key requests is lowIn terms of scenarios, the second option should be chosen. And for empty dataThe number of keys is limited, and the probability of repeated key requests is highIn terms of scenarios, the first option should be chosen.

2. Cache Avalanche

2.1 What is a cache avalanche?

As can be seen from the above, the cache actually plays a role in protecting the database. It helps the database to withstand a large number of query requests, thereby preventing vulnerable databases from being harmed.

If the cache is down for some reason, the massive query requests that were originally blocked by the cache will flock to the database like a mad dog. At this point, if the database cannot withstand this enormous pressure, it will collapse.

This is cache avalanche.

2.2 How to avoid cache avalanches?

2.2.1 Usecache cluster, to ensure high availability of the cache
Just like an aircraft has multiple engines, if the cache layer is designed to be highly available, even if individual nodes, individual machines, or even computer rooms fail, services can still be provided, such as the one described above.Redis SentinelandRedis ClusterAll achieve high availability.

2.2.1 Using Hystrix

Hystrix is ​​an open source "anti-avalanche tool" that reduces losses after avalanches occur through three means of fusing, downgrading, and current limiting.

Hystrix is ​​a Java class library, which adopts the command mode, and each service processing request hasrespective processors. All requests go through their respective handlers. The processor will record the current service requestfailure rate. Once it is found that the request failure rate of the current service reaches the preset value, Hystrix will reject all subsequent requests of the service,Return a preset result directly. This is called a "fuse". After a period of time, Hystrix will release part of the request of the service,Count its request failure rate again. If the request failure rate meets the preset value at this time, the current limit switch is fully turned on; if the request failure rate is still high, all requests for the service will continue to be rejected. This is called"Limiting". And Hystrix directly returns a preset result to those rejected requests, called"Downgrade"

2.2.2 Common and recommended ways: define a unified fallback interface

pom.xml dependencies

<!--Hystrix断路器-->
<dependency>
     <groupId>org.springframework.cloud</groupId>
     <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

application.properties configuration file

#Specify the running port
server.port=8200
#service name
spring.application.name=order
#Get the list of registered instances
eureka.client.fetch-registry=true
#Register to Eureka's registry
eureka.client.register-with-eureka=true
#Configure the registry address
#eureka.client.zhang.service-url.defaultZone=http://localhost:8001/eureka/
eureka.client.service-url.defaultZone=http://localhost:8000/eureka/
 
#feign client establishment connection timeout
feign.client.config.default.connect-timeout=10000
#feign The timeout time for reading resources after the client establishes a connection
feign.client.config.default.read-timeout=10000
 
#Open Hystrix circuit breaker
feign.hystrix.enabled=true
#Configure Hystrix timeout timeout timeout
#hystrix.command.default.execution.timeout.enabled=false
#Timeout time (default 1000ms) is configured in the caller, the timeout time of all methods of the caller is this value, and the priority is lower than the specified configuration below
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=3000
#Configured in the caller, the timeout time of the method specified by the caller (HystrixCommandKey method name) is this value
hystrix.command.HystrixCommandKey.execution.isolation.thread.timeoutInMilliseconds=4000
#The number of core threads in the thread pool defaults to 10
hystrix.threadpool.default.coreSize=10
#Maximum queue length. The default is -1. If you want to change from -1 to another value, you need to restart, that is, the value cannot be adjusted dynamically. If you want to adjust dynamically, you need to use the following configuration
hystrix.threadpool.default.maxQueueSize=100
#The threshold of the number of queued threads, the default is 5, when it is reached, it will be rejected. If this option is configured, the size of the queue is the queue
hystrix.threadpool.default.queueSizeRejectionThreshold=5
# In short, when the number of failed requests reaches 20 within 10s, the circuit breaker is opened. Short circuit when this number of failures is reached within the configured time window. 20 by default
hystrix.command.default.circuitBreaker.requestVolumeThreshold=20
#How long after the short circuit start to try to restore, the default is 5s
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds=5
#Error percentage threshold, when this threshold is reached, start short circuit. Default 50%
hystrix.command.default.circuitBreaker.errorThresholdPercentage=50%
#The calling thread is allowed to request the maximum number of HystrixCommand.GetFallback(), the default is 10. When it exceeds, an exception will be thrown. Note: This configuration also works for THREAD isolation mode
hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests=50000

Define the Feign call interface, and create a new unified fallback processing class and implement the Feign call interface

@FeignClient(value = "member",fallback = MemberServiceFallback.class)
public interface MemberServiceFeign extends IMemberService {
    
    //In order to facilitate the calling of the interface of the member service, the direct inheritance of the member service interface is not easy to write mistakes and reduces the amount of code
    //@FeignClient(value = "member",fallback = MemberServiceFallback.class)
    //value value is called service name fallback value is uniformly defined fallback class
 
}
@Component
public class MemberServiceFallback implements MemberServiceFeign {
 
    @Override
    public UserEntity getMember(String name) {
        return null;
    }
 
    //Service downgrade friendly prompt
    @Override
    public ResultVO getUserinfo() {
        return new ResultVO(StatusCode.RESULT_SUCCESS, &quot;Server is busy! Please try again later!!!&quot;);
    }
}

startup class

@SpringBootApplication
@EnableEurekaClient //Open the eureka client
@EnableFeignClients //Enable feign call
@EnableHystrix //Open hystrix
public class AppOrder {
    public static void main(String[] args) {
        SpringApplication.run(AppOrder.class, args);
    }
}

Feign calls the service downgrade test interface (implemented by the called member service interface, sleeps for 5 seconds, and configures the timeout for 3 seconds, so it will time out)

@RestController
public class OrderServcieImpl implements IOrderService {
 
    @Autowired
    private MemberServiceFeign memberServiceFeign;
 
    //The second way of writing hystrix, using the class method
    @RequestMapping("/orderToMemberUserInfoHystrixDemo02")
    public ResultVO orderToMemberUserInfoHystrixDemo02() {
        System.out.println(&quot;orderToMemberUserInfoHystrixDemo02: Thread pool name:&quot;+Thread.currentThread().getName());
        return memberServiceFeign.getUserinfo();
    }
}
@Override
@RequestMapping("/getUserinfo")
public ResultVO getUserinfo() {
    try {
       Thread.sleep(5000);
    }catch (Exception e){
       e.printStackTrace();
    }
    return new ResultVO(StatusCode.RESULT_SUCCESS,&quot;The order service interface calls the member service interface successfully....&quot;+serverPort);
    }
}

Start services such as eureka member order and test

{&quot;resultCode&quot;:&quot;00000000&quot;,&quot;resultMsg&quot;:&quot;SUCCESS&quot;,&quot;data&quot;:&quot;The server is busy! Please try again later!!!&quot;}

2. @HystrixCommand annotation method

1. The pom.xml file application.properties configuration file starts the same method as Feign calls the service interface and does not fallback

2. Test the code

//To solve the service avalanche effect hystrix has two ways to configure protection services through annotations and interfaces
    //The role of the fallbackMethod method: service downgrade execution
    //@HystrixCommand enables thread pool isolation by default, service downgrade, service fuse
    @HystrixCommand(fallbackMethod = "orderToMemberUserInfoHystrixFallbackMethod")
    @RequestMapping("/orderToMemberUserInfoHystrix")
    public ResultVO orderToMemberUserInfoHystrix() {
        System.out.println(&quot;orderToMemberUserInfoHystrix: thread pool name:&quot;+Thread.currentThread().getName());
        return memberServiceFeign.getUserinfo();
    }
    
    //Service downgrade processing method
    public ResultVO orderToMemberUserInfoHystrixFallbackMethod(){
        return new ResultVO(StatusCode.RESULT_SUCCESS,&quot;Return friendly prompt: service downgrade!!! The server is busy, please try again later!!!!&quot;);
    }

3. Start the service and call the interface test

{&quot;resultCode&quot;:&quot;00000000&quot;,&quot;resultMsg&quot;:&quot;SUCCESS&quot;,&quot;data&quot;:&quot;The server is busy! Please try again later!!!&quot;}

3. Cache breakdown (centralized failure of hotspot data)

3.1 What is Centralized Failure of Hotspot Data?

We generally set an expiration time for the cache. After the expiration time, the database will be deleted directly by the cache, thus ensuring the real-time nature of the data to a certain extent.

However, for some hotspot data with extremely high requests, once the valid time has passed, there will beLots of requests falling on the database, which may cause the database to crash.

If a certain hotspot data fails, then when there is a query request for the data again, it will go to the database to query. However, during the period from when the request is sent to the database to when the data is updated in the cache, since there is still no such data in the cache, the query requests arriving during this period will all fall on the database, which will cause damage to the database. Enormous pressure. In addition, the cache is repeatedly updated when these request queries are completed.

3.2 Solutions

3.2.1 Mutex

This method only allowsOne thread rebuilds the cache, other threads wait for the thread that rebuilds the cache to finish executing,Get the data from the cache again.

When the first database query request is initiated, the data in the cache will be locked; at this time, other query requests that arrive in the cache will not be able to query the field, and thus will be blocked waiting; when the first request completes the database query, the data After updating the value cache, release the lock; at this time, other blocked query requests will be able to check the data directly from the cache.

When a certain hotspot data fails, only the first database query request is sent to the database, and all other query requests are blocked, thus protecting the database. However, due to the use of mutual exclusion locks, other requests will block and wait, and the throughput of the system will drop at this time. This needs to be considered in conjunction with the actual business to allow this.

Mutual exclusion locks can avoid the problem of database crash caused by the failure of a certain hot data, but in actual business, there are often problemsA scenario where a batch of hot data fails at the same time. So, how to prevent database overload for this scenario?

Set different expiration times

When we store this data in the cache, we can store theirCache invalidation time staggered. This avoids simultaneous failures. For example: adding/subtracting a random number to a base time to stagger the invalidation times of these caches

3.3.2 Never expire

&quot;Never expires&quot; has two meanings:
From the perspective of caching, there is indeed no expiration time set, so there will be no problems caused by the expiration of the hotspot key, that is, the &quot;physical&quot; does not expire.
From a functional point of view, set for each valuea logical expiration time, when it is found that the logical expiration time is exceeded, it will be usedA separate thread to build the cache

From a practical point of view, this method effectively eliminates the problem of hot keys, but the only disadvantage is that during the reconstruction of the cache, there will beInconsistent data, it is up to the application side to tolerate this inconsistency.

3.3.3 Comparison of the two schemes

Mutex key: The idea of ​​this scheme is relatively simple, but there are certain hidden dangers. If there is a problem with the construction of the cache or the time is long, there may beRisk of deadlocks and thread pool blocking, but this method can better reduce the back-end storage load and do better in terms of consistency.

&quot;Never expires&quot;: Since this scheme does not set a real expiration time, there is actually no series of hazards caused by hot keys, but there will be data inconsistency and code complexity will increase.
It's 9:30 on Monday night, wish you a good dream~

Creation is not easy, if this article can help you, please give support, give roses, hand has fragrance, insects, crabs and crabs are the grandfather of the audience