The method of smooth up and down line of spring cloud service

Time:2020-2-2

Make complaints

It used to be RPC by hand, and I am deeply distressed to contact spring cloud recently. There are mainly the following points:

1) Huge code, long time to find bug, super complex design

2) Version management is chaotic, and there are often inexplicable configuration errors (so 2.0 is too late to go to production)

3) Some of Netflix’s code is really confusing and doesn’t consider scalability at all

4) Huge ecological chain and learning cost

It is recommended that students who are ready to go to microservice fix the next version and do not update or downgrade it at will. Take Tomcat’s basedir for example, the versions 1.5.8 to 1.5.13 to 1.5.16 are a trade-off, and accidents will happen if you are not careful.


server:
 port: 21004
 context-path: /
 tomcat:
  basedir: file:.

As mentioned above, basedir changes from. To file:… And then from file:… To.. with no compatible code. Do you want to kill an engineer?

Preface

Today’s main topic is the smooth up and down line function. The so-called smooth, refers to the hair version without perception, not to wait until the dead of night when secretly to do. Some requests can take a long time, but they can’t fail. Especially for payment, it’s very annoying that you can’t spend money and you can’t buy things. As a whole, spring cloud has a full range of functions, which is very comfortable to use after a period of stepping on the pit.

Our microservices are basically integrated with the following.

Well, a huge ecosystem

problem

So the problem is that the registration of spring cloud to the registry is called through the rest interface. It can’t take effect in time like zookeeper. I can’t go to the rotation training as fast as redis. It’s too delicate and I’m afraid the rotation is broken. The following picture:

There are three requirements:

1) After ServiceA logs off an instance, zuul gateway cannot fail 2) after serviceb logs off an instance, feign call of ServiceA cannot fail 3) service logs on and off, Eureka service can quickly sense

Let’s be clear about one thing, how to minimize the discovery time of zuul and other dependent services after the service is offline, and ensure that the request does not fail during this time.

Solve time problems

Influence factor

1) Eureka’s two-tier caching problem

Eurekaserver has two caches by default, one is readwritemap and the other is readonlymap. When a service provider registers a service or maintains a heartbeat, the readwritemap is modified. When a service caller queries the service instance list, it will read from readonlymap by default (this can be configured in the native Eureka, not in the spring cloud Eureka, and readonlymap reading must be enabled). This can reduce the contention of readwritemap read-write lock and increase the swallowing and spitting amount. Eurekaserver regularly updates data from readwritemap to readonlymap

2) Heartbeat time

After the service provider registers the service, the heartbeat will be timed. This is determined by the service refresh time in the Eureka configuration of the service provider. Another configuration is the service expiration time. This configuration is configured by the service provider but used in eurekaserver, but the default configuration eurekaserver will not enable this field. The scan failure time of eurekaserver needs to be configured to enable the active failure mechanism of eurekaserver. When this mechanism is enabled: each service provider will send its own service expiration time. Eurekaserver will regularly check the expiration time of each service and the last heartbeat time. If no heartbeat has been received within the expiration time and is not in protection mode, this instance will be removed from readwritemap

3) Rotation interval of caller service pulling list from Eureka

4) Ribbon cache

Solution

1) Disable Eureka’s readonlymap cache (Eureka side)


eureka.server.use-read-only-response-cache: false

2) Active failure is enabled, and each active failure detection interval is 3S (Eureka end)


eureka.server.eviction-interval-timer-in-ms: 3000

For example, eureka.server.responsecacheupdateinvestalms and eureka.server.responsecacheautoexpiration inseconds are useless when active failure is enabled. The default 180s is enough to drive people crazy.

3) Service expiration time (service provider)


eureka.instance.lease-expiration-duration-in-seconds: 15

If no heartbeat is received after this time, eurekaserver will reject this instance. Eureka server must set eureka.server.eviction-interval-timer-in-ms otherwise the configuration is invalid, which is generally three times the service refresh time configuration. Default 90s!

4) Service refresh time configuration, at which time the heartbeat will be active (service provider)


eureka.instance.lease-renewal-interval-in-seconds: 5

Default 30s

5) Pull service list interval (client)


eureka.client.registryFetchIntervalSeconds: 5

Default 30s

6) Ribbon refresh time (client)


ribbon.ServerListRefreshInterval: 5000

Ribbon even has cache, default 30s

These time-out times affect each other. Unexpectedly, three places need to be configured. If you are not careful, there will be a dilemma that the service is not offline or online. It has to be said that this set of default parameters of spring cloud is just funny.

retry

So what is the longest unavailable time for a server to go offline? (that is, the request will fall to the offline server, and the request fails). If you are in a hurry, the basic time is eureka.client.registryfetchintervalseconds + ribbon.serverlistrefreshinterval, which is about 8 seconds. If the active failure time of the server is included, the time will be increased to 11 seconds.

If you have only two instances, in extreme cases, the discovery time of the service online also needs 11 seconds, that is 22 seconds.

Ideally, between these 11 seconds, the request fails. If your QPS is 1000 and four nodes are deployed, the number of failed requests in 11 seconds will be 1000 / 4 * 11 = 2750, which is unacceptable. So we need to introduce retry mechanism.

Spring cloud is relatively simple to introduce retry. But it’s not just a configuration. Since retry is used, you need to control the timeout. You can follow these steps:

Introduce POM (don’t forget)


<dependency>
  <groupId>org.springframework.retry</groupId>
  <artifactId>spring-retry</artifactId>
</dependency>

Add configuration

ribbon.OkToRetryOnAllOperations:true 
#(whether all operations are retried, if false, only get requests are retried)
ribbon.MaxAutoRetriesNextServer:3 
#(maximum number of retries for other instances of retry load balancing, excluding the first instance)
ribbon.MaxAutoRetries:1
#(the maximum number of retries for the same instance, excluding the first call)
ribbon.ReadTimeout:30000
ribbon.ConnectTimeout:3000
ribbon.retryableStatusCodes:404,500,503
#(those States to retry)
spring.cloud.loadbalancer.retry.enable:true
#(retry switch)

Publishing system

OK, the mechanism has been explained clearly, but in practice, it’s still very complicated and disturbing. For example, if there are two instances of a service, I want to release one by one. Before releasing the second, I have to wait at least 11 seconds. If the hand speed is too fast, it is a disaster. So a supporting release system is necessary.

First of all, you can request Eureka through rest request, and take the initiative to isolate an instance. With this step, you can reduce the service unavailable time for at least 3 seconds (which is more cost-effective).

Then package and push the package through the packaging tool. Go online and replace.

There is no such continuous integration in the market, so the publishing system needs customization, which is also part of the workload.

So far, it just solves the function of spring cloud microservice smooth online and offline. As for grayscale, it’s another topic. It is wise for qualified companies to choose self-study, so as not to lower their functions to such a level.

But don’t worry about it in general. It’s still a question of whether your company will survive. Netflix has endured. Can you do better than it?

The above is the whole content of this article. I hope it will help you in your study, and I hope you can support developepaer more.

Recommended Today

Swift advanced 08: closure & capture principle

closure closurecanCapture and storageOf any constants and variables defined in their contextquote, this is the so-calledClose and wrap those constants and variablesTherefore, it is called“closure”Swift can handle everything for youCaptured memory managementOperation of. Three forms of closure [global function is a special closure]: a global function is a closure that has a name but does […]