Distributed service fuse downgraded current limiter to hystrix

Time:2021-10-22

Original address 1

Original address 2

Full text overview

[TOC]

Why need hystrix

Hystrix official website address GitHub

  • Hystrix is also Netfix’s contribution to distributed systems. Similarly, it has entered the non maintenance phase. Not maintaining does not mean being eliminated. It can only show that the new technology is constantly iterative. The brilliant design is still worth learning.
  • In distributed environment, service scheduling is a feature and a headache. In the chapter of service governance, we introduce the functions of service governance. In the previous lesson, we also introduced ribbon and feign for service invocation. Now it’s time for service monitoring and management. Hystrix is to isolate and protect services. So that the service will not fail. Make the whole system unavailable

Distributed service fuse downgraded current limiter to hystrix

  • As shown in the above figure, when multiple clients call aservice for services, there are three aservices in the distributed system, and some logic of aservice needs bservice processing. Bservice deploys two services in the distributed system. At this time, the communication between one of the aservices and bservice is abnormal due to network problems. If bservice does log processing. In the view of the whole system, it doesn’t matter if the log is lost compared with the system downtime. However, at this time, the whole service of aservice is unavailable due to network communication problems. It’s a little hard to try.

Distributed service fuse downgraded current limiter to hystrix

  • Look at the picture. A–>B–>C–>D 。 At this time, the D service is down. Processing exception occurred due to downtime of C and D. But C’s thread is still responding to B. In this way, when concurrent requests come in, the C service thread pool is full, causing the CPU to rise. At this time, other services of C service will also be affected by the rise of CPU, resulting in slow response.

Characteristic function

Hystrix is a low latency and fault-tolerant third-party component library. Access points designed to isolate remote systems, services, and third-party libraries. Maintenance has been stopped and recommended on the official websiteresilience4j。 But in China, we have springcloud Alibaba.

Hystrix implements delay and fault tolerance mechanisms in distributed systems by isolating access between services to solve service avalanche scenarios, and can provide fallbacks based on hystrix.

  • Fault tolerance for network delays and faults
  • Blocking distributed system avalanche
  • Fast failure and smooth recovery
  • service degradation
  • Real time monitoring and alarm

$$
99.99^{30} = 99.7\% \quad uptime
\\
0.3\% \quad of \quad 1 \quad billion \quad requests \quad = \quad 3,000,000 \quad failures
\\
2+ \quad hours \quad downtime/month \quad even \quad if \quad all \quad dependencies \quad have \quad excellent \quad uptime.
$$

  • A statistic given on the interview website. The overview of abnormalities in each of the 30 services is 0.01%. One hundred million requests will have 300000 failures. In this way, there will be at least 2 hours of downtime per month. This is fatal to the Internet system.

Distributed service fuse downgraded current limiter to hystrix

  • The above figure shows the two situations given on the official website. Similar to our previous chapter. They all introduce the service avalanche scene.

Project preparation

  • In the openfeign topic, we discussed the service implementation based on feign. At that time, we said that the internal is based on hystrix. At that time, we also saw the internal structure of POM. Eureka has built-in ribbon and also built-in hystrix module.

Distributed service fuse downgraded current limiter to hystrix

  • Although the package contains hystrix. Let’s introduce the corresponding start to start the related configuration. This is actually the Liezi in the openfeign topic. In that topic, we provided paymentservicefallbackimpl and paymentservicefallbackfactoryimpl as alternatives. However, at that time, we only need to point out that openfeign supports two options. Today we

    <!--hystrix-->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
    </dependency>

Demonstrate what disaster will happen if traditional enterprises do not have alternatives.

Distributed service fuse downgraded current limiter to hystrix

Distributed service fuse downgraded current limiter to hystrix

Interface test

  • First, we test the payment #createbyorder interface. View the next response

    Distributed service fuse downgraded current limiter to hystrix

  • The payment #gettimeout / ID method is being tested.

    Distributed service fuse downgraded current limiter to hystrix

    • Now let’s use jemeter to test the interface payment#gettimeout / ID. A customer who needs to wait at the 4S store will run out of resources. At this time, our payment #createbyorder will also be blocked.

    Distributed service fuse downgraded current limiter to hystrix

    • The default maximum number of threads of Tomcat in spring is 200. In order to protect our hard-working notebooks. Here we set the number of threads to a small point. In this way, we can more easily reproduce the situation that the thread is full. When the thread is full, the payment #createbyorder interface will be affected.

    Distributed service fuse downgraded current limiter to hystrix

  • What we tested above is the native interface of payment. If the pressure measurement is the order module. If fallback is not configured in openfeign. Then the order service will cause the thread to be full due to the concurrency of the payment#gettimeout / ID interface, resulting in the slow response of the order module. This is the avalanche effect. Next, we will solve the occurrence of avalanche from two aspects.

Service isolation

  • The above scenario occurs because payment #createbyorder and payment #gettimeout / ID belong to the payment service. A payment service is actually a Tomcat service. The same Tomcat service has a thread pool. Every time a request falls into the Tomcat service, it will apply for a thread in the thread pool. The requested business can only be processed by the thread after the thread is obtained. Because the thread pool is shared in Tomcat. Therefore, when payment #gettimeout / ID is concurrent, the thread pool will be emptied. As a result, other excuses and even irrelevant interfaces have no resources to apply for. Can only wait for the release of resources.
  • This is like taking an elevator during rush hours, because a company concentrates on working, resulting in all elevators being used for a period of time. At this time, the state leaders couldn’t get on the elevator.
  • We also know that this situation is easy to solve. Each park will have special elevators for special use.
  • We solve the above problems in the same way. Isolate. Different interfaces have different thread pools. So there won’t be an avalanche.

Thread isolation

Distributed service fuse downgraded current limiter to hystrix

  • Remember that we set the maximum number of threads in the order module to 10 in order to demonstrate concurrency. Here, we call the order / getpayment / 1 interface through the test tool to see the log printing

    Distributed service fuse downgraded current limiter to hystrix

  • We print the current thread where the interface is called. We can see that all 10 threads are used back and forth. This is also why the above causes avalanche.
@HystrixCommand(
            groupKey = "order-service-getPaymentInfo",
            commandKey = "getPaymentInfo",
            threadPoolKey = "orderServicePaymentInfo",
            commandProperties = {
                    @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "1000")
            },
            threadPoolProperties = {
                    @HystrixProperty(name = "coreSize" ,value = "6"),
                    @HystrixProperty(name = "maxQueueSize",value = "100"),
                    @HystrixProperty(name = "keepAliveTimeMinutes",value = "2"),
                    @HystrixProperty(name = "queueSizeRejectionThreshold",value = "100")

            },
            fallbackMethod = "getPaymentInfoFallback"
    )
    @RequestMapping(value = "/getpayment/{id}",method = RequestMethod.GET)
    public ResultInfo getPaymentInfo(@PathVariable("id") Long id) {
        log.info(Thread.currentThread().getName());
        return restTemplate.getForObject(PAYMENT_URL+"/payment/get/"+id, ResultInfo.class);
    }
    public ResultInfo getPaymentInfoFallback(@PathVariable("id") Long id) {
        Log. Info ("the alternative scheme has been entered, and it will be executed by the free thread" + thread. Currentthread(). Getname());
        return new ResultInfo();
    }
  @HystrixCommand(
            groupKey = "order-service-getpaymentTimeout",
            commandKey = "getpaymentTimeout",
            threadPoolKey = "orderServicegetpaymentTimeout",
            commandProperties = {
                    @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "10000")
            },
            threadPoolProperties = {
                    @HystrixProperty(name = "coreSize" ,value = "3"),
                    @HystrixProperty(name = "maxQueueSize",value = "100"),
                    @HystrixProperty(name = "keepAliveTimeMinutes",value = "2"),
                    @HystrixProperty(name = "queueSizeRejectionThreshold",value = "100")

            }
    )
    @RequestMapping(value = "/getpaymentTimeout/{id}",method = RequestMethod.GET)
    public ResultInfo getpaymentTimeout(@PathVariable("id") Long id) {
        log.info(Thread.currentThread().getName());
        return orderPaymentService.getTimeOut(id);
    }
  • The demonstration effect here is not good. I’ll show the data directly.
The concurrency is in getpaymenttimeout getpaymentTimeout/{id} /getpayment/{id}
20 After three threads are full, an error is reported for a period of time Can respond normally; Also slow, CPU thread switching takes time
30 ditto ditto
50 ditto It will also time out, because the pressure of order calling payment service will be affected
  • If we load hystrix into the payment native service, the third situation above will not occur. Why did I put it on order just to show you the avalanche scene. When concurrency is 50, because the maximum thread set by payment is 10, it also has throughput. Although the order#getpyament/id interface has its own thread running in the order module because of hystrix thread isolation, it suck its own timeout because of the lack of force in the native service, which affects the operation effect. This demonstration is also to lead to a scene simulation of fallback to solve avalanche.
  • We can set fallback through hystrix in the payment service. Ensure that the payment service has low latency, so as to ensure that the order module will not cause normal interface exceptions such as order#getpayment because the payment itself is slow.
  • One more thing, although thread isolation is performed through hystrix. However, when we run other interfaces, the response time will be a little longer. Because the CPU has overhead when switching threads. This is also a pain point. We can’t isolate threads at will. This leads to our semaphore isolation.

Semaphore isolation

  • The semaphore isolation will not be demonstrated here. The demonstration is not very meaningful
   @HystrixCommand(
            commandProperties = {
                    @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds",value = "1000"),
                    @HystrixProperty(name = HystrixPropertiesManager.EXECUTION_ISOLATION_STRATEGY,value = "SEMAPHORE"),
                    @HystrixProperty(name = HystrixPropertiesManager.EXECUTION_ISOLATION_SEMAPHORE_MAX_CONCURRENT_REQUESTS,value = "6")
            },
            fallbackMethod = "getPaymentInfoFallback"
    )
  • The above configuration indicates that the maximum semaphore is 6. Indicates that the wait will occur after concurrency 6. The wait timeout did not exceed 1s.
measures advantage shortcoming overtime Fuse asynchronous
Thread isolation One call a thread pool; Do not interfere with each other; Ensure high availability CPU thread switching overhead
Semaphore isolation Avoid CPU switching. Efficient In high concurrency scenarios, the amount of stored signals becomes larger × ×
  • In addition to thread isolation, semaphore isolation and other isolation methods, we can enhance stability through request merging, interface data caching and other means.

service degradation

Trigger condition

  • Exception except hystrixbadrequestexception occurred in the program.
  • Service call timeout
  • Service fuse
  • Insufficient thread pool and semaphore

Distributed service fuse downgraded current limiter to hystrix

  • Our timeout interface above. Whether thread isolation or semaphore isolation, subsequent requests will be rejected directly when the conditions are met. That’s rude. We also mentioned fallback above.
  • Also remember that the getpayment interface exception will be caused when we order 50 concurrent timeouts. At that time, it was located because the pressure of the native payment service could not hold up. If we add fallback to payment, we can ensure fast response when resources are insufficient. This at least ensures the availability of the order#getpayment method.

    Distributed service fuse downgraded current limiter to hystrix

    • However, this configuration is experimental. In real production, it is impossible to configure fallback on each method. It’s so stupid.
    • In addition to the specially customized fallback in the method, hystrix also has a global fallback. Just pass on the class@DefaultProperties(defaultFallback = "globalFallback")To implement global alternatives. When a method meets the conditions for triggering degradation, if the correspondingHystrixCommandIf the fallback is not configured in the annotation, the global fallback of the class is used. If there is no global, an exception is thrown.

      Insufficient

      • althoughDefaultPropertiesYou can avoid configuring fallback for each interface. But this kind of global fallback doesn’t seem to be global. We still need to configure fallback on each class. The author looked up the information and didn’t seem to have it
      • However, in the openfeign topic, we talked about the service degradation function implemented by openfeign combined with hystrix. I also mentioned oneFallbackFactoryThis class. This class can be understood as springBeanFactory。 This class is used to generate what we needFallBackof In this factory, we can generate a proxy object of general type fallback. The proxy object can enter and exit parameters according to the method signature of the proxy method.
      • In this way, we can configure this factory class in all openfeign places. In this way, you can avoid generating many fallbacks. The fly in the ointment still needs to be specified everywhere. aboutFallBackFactoryIf you are interested, you can download the source code or go to the home page to view the openfeign topic.

Service fuse

@HystrixCommand(
            commandProperties = {
                    @Hystrixproperty (name = "circuitbreaker. Enabled", value = "true"), // whether to open the circuit breaker
                    @Hystrixproperty (name = "circuitbreaker. Requestvolumthreshold", value = "10"), // number of requests
                    @Hystrixproperty (name = "circuitbreaker. Sleepwindowinmilliseconds", value = "10000"), // time range
                    @Hystrixproperty (name = "circuitbreaker. Errorthresholdpercentage", value = "60"), // trip after the failure rate reaches
            },
            fallbackMethod = "getInfoFallback"
    )
    @RequestMapping(value = "/get", method = RequestMethod.GET)
    public ResultInfo get(@RequestParam Long id) {
        if (id < 0) {
            int i = 1 / 0;
        }
        log.info(Thread.currentThread().getName());
        return orderPaymentService.get(id);
    }
    public ResultInfo getInfoFallback(@RequestParam Long id) {

        return new ResultInfo();
    }
  • First, we turn on the fuse through circuitbreaker. Enabled = true
  • circuitBreaker.requestVolumeThresholdSet statistics request times
  • circuitBreaker.sleepWindowInMillisecondsSet the time sliding unit, how long to try to open after triggering the fuse, and the commonly known half open state
  • circuitBreaker.errorThresholdPercentageSet the critical conditions for triggering the fused switch
  • In the above configuration, if the error rate of the last 10 requests reaches 60%, the fuse degradation is triggered, and the service is in the fuse state within 10s for degradation. Try to get the latest service status after 10s
  • Let’s interface through JMeterhttp://localhost/order/get?id=-1Conduct 20 tests. Although no additional error will be reported in these 20 times. But we will find that the initial error is due to the error in our code. The following error is the error of hystrix fusing. At the beginning, try by zero error, followed by short circuited and fallback failed error

Distributed service fuse downgraded current limiter to hystrix

  • Normally, we will configure fallback in hystrix. We have implemented the two methods of fallback in the downgrade section above. Here is to make it easier to see the difference between errors.

Distributed service fuse downgraded current limiter to hystrix

  • The parameters configured in the hystrixcommand are basically in the hystrixpropertiesmanager object. We can see that there are 6 parameters about the fuse configuration. It is basically the four configurations above

Service current limiting

  • Service degradation. The two isolation methods mentioned above are the strategies to implement flow restriction.

Request merge

  • In addition to fusing, degradation and current limiting, hystrix also provides us with request merging. As the name suggests, merging multiple requests into one request has achieved the problem of reducing concurrency.
  • For example, we have one order after another to query the order informationorder/getId?id=1Suddenly ten thousand requests came. To ease the pressure, let’s focus on requests, which are called every 100 requestsorder/getIds?ids=xxxxx。 In this way, the final payment module is 10000 / 100 = 100 requests. Next, we implement the request merge through code configuration.

HystrixCollapser

@Target({ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface HystrixCollapser {
    String collapserKey() default "";

    String batchMethod();

    Scope scope() default Scope.REQUEST;

    HystrixProperty[] collapserProperties() default {};
}
attribute meaning
collapserKey Unique identification
batchMethod Request merge processing method. That is, the method to be called after merging
scope Scope; Two methods [request, global];
Request: if the condition is met in the same user request, it will be merged
Global: requests from any thread will be added to the global statistics
HystrixProperty[] Configure related parameters

Distributed service fuse downgraded current limiter to hystrix

  • In hystrix, all property configurations will be in hystrixpropertiesmanager.java. We can find collapser in it. There are only two related configurations. Represents the maximum number of requests and statistical time unit respectively.
    @HystrixCollapser(
            scope = com.netflix.hystrix.HystrixCollapser.Scope.GLOBAL,
            batchMethod = "getIds",
            collapserProperties = {
                    @HystrixProperty(name = HystrixPropertiesManager.MAX_REQUESTS_IN_BATCH , value = "3"),
                    @HystrixProperty(name = HystrixPropertiesManager.TIMER_DELAY_IN_MILLISECONDS, value = "10")
            }
    )
    @RequestMapping(value = "/getId", method = RequestMethod.GET)
    public ResultInfo getId(@RequestParam Long id) {
        if (id < 0) {
            int i = 1 / 0;
        }
        log.info(Thread.currentThread().getName());
        return null;
    }
    @HystrixCommand
    public List<ResultInfo> getIds(List<Long> ids) {
        System.out.println(ids.size()+"@@@@@@@@@");
        return orderPaymentService.getIds(ids);
    }
  • Above, we configured getid to execute getids request, which is 10s at most. The three requests will be merged together. Then getids has the payment service to query separately, and finally returns multiple resultinfo.

Distributed service fuse downgraded current limiter to hystrix

  • We perform the pressure test of getid interface through jemeter. The maximum length of IDS in the log is 3. Verify the configuration of the getid interface above. In this way, it can ensure that interface merging will be carried out in case of high concurrency and reduce TPS.
  • Above, we perform interface merging by requesting method annotations. In fact, the internal hystrix is through the hystrixcommand

Workflow

Distributed service fuse downgraded current limiter to hystrix

  • There are 9 process diagrams and process descriptions given on the official website. Let’s translate it below.
  • ① , create a hystrixcommand or hystrixobservercommand object

    • Hystrixcommand: used for relying on a single service
    • Hystrixobservablecommand: used for relying on multiple services
  • ② . execute the command, and execute and queue the hystrixcommand; The hystrixobservable command executes observe and toobservable
method effect
execute Synchronous execution; Return result object or throw exception
queue Asynchronous execution; Return future object
observe Return observable object
toObservable Return observable object
  • ③ . check whether the cache is enabled and whether it hits the cache. If it hits, the cache response is returned
  • ④ . whether it is blown. If it is blown, the fallback will be degraded; Release if the fuse is closed
  • ⑤ Whether resources are available for, thread pool and semaphore. If there are not enough resources, a fallback occurs. Release if any
  • ⑥ , execute run or construct methods. These two methods are native to hystrix. Implementing hystrix in Java will implement the logic of the two methods, which has been encapsulated by spring cloud. Don’t look at these two methods here. If the execution is wrong or timed out, a fallback occurs. During this period, the logs will be collected to the monitoring center.
  • ⑦ Calculate the fuse data and judge whether it is necessary to try to release; The statistical data here will be viewed in the dashboard of hystrix.stream. It is convenient for us to locate the health status of the interface
  • ⑧ In the flow chart, we can also see that ④, ⑤ and ⑥ all point to fallback. It is also commonly known as service degradation. It can be seen that service degradation is a hot business of hystrix.
  • ⑨ . return response

HystrixDashboard

  • In addition to service fusing, degradation and current limiting, hystrix also has an important feature of real-time monitoring. And form report statistics interface request information.
  • The installation of hystrix is also very simple. You only need to configure actor and in the projecthystrix-dashboardJust two modules
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
  • Add on startup classEnableHystrixDashboardDashboard is introduced. We don’t need any development. Like Eureka, the master needs a simple package.

Distributed service fuse downgraded current limiter to hystrix

  • In this way, the dashboard is built. Dashboard is mainly used to monitor the request processing of hystrix. So we also need to expose the endpoint in the hystrix request.
  • Add the following configuration to the module using the hystrix command. I’ll add it to the order module
@Component
public class HystrixConfig {
    @Bean
    public ServletRegistrationBean getServlet(){
        HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet();
        ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet);
        registrationBean.setLoadOnStartup(1);
        //Note that the final access address of / hystrix.stream configured here is localhost: port / hystrix.stream; If the configuration in the configuration file is required in the new version
        //Add an actor, that is, localhost: port / actor
        registrationBean.addUrlMappings("/hystrix.stream");
        registrationBean.setName("HystrixMetricsStreamServlet");
        return registrationBean;
    }
}
  • Then we access the order modulelocalhost/hystrix.streamThe Ping interface will appear. Indicates that the order module is successfully installed and monitored. Of course, order also requires the actor module
  • Let’s use JMeter to measure our fusing, degradation and current limiting interfaces. Let’s use the dashboard to see each state.

Distributed service fuse downgraded current limiter to hystrix

  • The animation above looks like our service is still very busy. Think about e-commerce. When you look at the broken line image of each interface, it doesn’t seem to be your heartbeat. If it’s too high, you’ll worry. Too low, there is no high achievement. Let’s take a look at the indicator details of dashboard

Distributed service fuse downgraded current limiter to hystrix

  • Let’s look at the status of each interface during the operation of our service.

Distributed service fuse downgraded current limiter to hystrix

Aggregation monitoring

  • Above, we use the new modulehystrix-dashboardTo monitor our order module. However, in practical application, it is impossible to configure hystrix only in order.
  • We just configured it in order for demonstration. Now we also configure it in hystrix in payment. Then we need to switch the monitoring data of order and payment back and forth in the dashboard.
  • So here comes our aggregation monitoring. Before aggregation monitoring, we will also introduce payment into hystrix. Note that we injected hystrix. Stream in bean mode above. The actor is not required to access the prefix

Create a new hystrix turbine

pom

<!-- Added hystrix dashboard -- >
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-netflix-turbine</artifactId>
        </dependency>
  • The main thing is to add turbo coordinates. The others are hystrix, dashboard and other modules. See the source code at the end for details

yml

spring:
  application:
    name: cloud-hystrix-turbine

eureka:
  client:
    register-with-eureka: true
    fetch-registry: true
    service-url:
      defaultZone: http://localhost:7001/eureka
  instance:
    prefer-ip-address: true

#Aggregation monitoring

turbine:
  app-config: cloud-order-service,cloud-payment-service
  cluster-name-expression: "'default'"
  #The configuration here is the same as the URL. If / Actor / hystrix.stream is, you need to configure the actor
  instanceUrlSuffix: hystrix.stream

Startup class

Add on startup classEnableTurbineannotation

Distributed service fuse downgraded current limiter to hystrix




Source code

Above source code

Distributed service fuse downgraded current limiter to hystrix