[original] experience sharing: a bloody case triggered by content length

Time:2020-9-29

preface

I had a problem at work last week. It’s very interesting. Let me record it here. I had a problem at work last week. It’s very interesting. Let me record it here. The title is very bluffing. This problem almost caused a bloody case,Hua GeI’m still a very strict person. I made a note lateralmost....

In the test environment, when the front-end calls an interface we serve, it is extremely slow and the response time exceeds30sI can’t stand it!!

View log shows that our service is passing throughFeignOf another serviceGETInterface, and then tried again until it failed. But the strange thing is that it’s manualIP + portRequest this timeoutGETThe interface is very fast.

This is very strange. I have been calling a good interface before. How can I always time out now?At this time, I am full of question marks…

phenomenon

The front end calls our service (here calledService a)A query interface is used in the front endPOSTRequest, our service will pass againFeignCall to another service (here calledService B)This interface is provided to the publicGETForm.

From the phenomenon, it is very slow to call our service. A request responds for dozens of seconds. The specific process is as follows:

[original] experience sharing: a bloody case triggered by content length

Troubleshooting

At that time, the confusion in my mind was that it was too strange that this situation should not happen to the interface called before, and it should be passed manuallyIP + portTo call, the response speed is very fast, so I found theService BForeign development students look together, because they have ignored some important log information, so there are many detours here, with the help of colleagues, I also sort out this problem.

The root cause of the problem is that we areGETRequestedHeaderIn theContent-LengthParameter, and service B recently added ajarBags,jarOne of the interceptors in does something that causes this problem. I’m here fromSource code levelTo sort out the root cause of the whole problem, and how to avoid such problems in the future!

To solve this problem, we will launch it locallyService aandService B, in order toDEBUGMode start, discovery can be stable recurrence, and can be seen in the callService BWhen it got stuckStack information

[original] experience sharing: a bloody case triggered by content length

Service aThe reason why the initiated request is stuck is that theawaitLatch()If it is suspended, it can be regarded as a breakthrough to find the cause of the problem. If you continue to follow up step by step, you can find the problem. Next, we will carefully analyze it step by step.

Cause of the problem

The cause of the problem here is actually deduced from the above Troubleshooting:

  1. When the front end calls the server interface, because it ispostRequest, soheaderThere areContent-LengthProperties, callingfeignOn request, regardless ofgetstillpostRequest, there is one in the company’s underlying packageFeignThe interceptor will send the front-end requestHeaderProperty assigned tofeignOn requestHeader, which caused us to sendGETrequestHeaderIt also containsContent-LengthProperty.

PS: This is a pitiful point. A feign interceptor is added to the underlying package. We can see the content length property only by printing the feign request log on the console, and finally trace it to the feigninterceptor

  1. Service B just depends on another onejarPackage, which contains aFilterInterceptor, which reads the sent requestbodyData, and then do some log printing. And this onejarPackage dependencies are what they just added, and they use some of the other tool classes in the package
public class ChannelFilter implements Filter {
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
        if (servletRequest instanceof HttpServletRequest) {
            requestWrapper = new RequestWrapper((HttpServletRequest)servletRequest);
            log.info("Http RequestURL : {}, Method : {}, RequestParam : {}, RequestBody : {}", new Object[]{((HttpServletRequest)servletRequest).getRequestURL(), ((HttpServletRequest)servletRequest).getMethod(), JSON.toJSON(servletRequest.getParameterMap()), ((RequestWrapper)requestWrapper).getBody()});
        }


        filterChain.doFilter((ServletRequest)requestWrapper, servletResponse);
    }


    public void destroy() {
    }
}

public class RequestWrapper extends HttpServletRequestWrapper {
    private static final Logger log = LoggerFactory.getLogger(RequestWrapper.class);
    private final String body;

    public RequestWrapper(HttpServletRequest request) {
        super(request);
        StringBuilder stringBuilder = new StringBuilder();
        BufferedReader bufferedReader = null;
        ServletInputStream inputStream = null;

        try {
            inputStream = request.getInputStream();
            if (inputStream != null) {
                bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
                char[] charBuffer = new char[4096];
                boolean var6 = true;

                int bytesRead;
                while((bytesRead = bufferedReader.read(charBuffer)) != -1) {
                    stringBuilder.append(charBuffer, 0, bytesRead);
                }
            }
        } catch (IOException var19) {
            log.error(var19.getMessage(), var19);
        }
    }
}

In executionrequest bodyRead the code using:

while((bytesRead = bufferedReader.read(charBuffer)) != -1) {
   stringBuilder.append(charBuffer, 0, bytesRead);
}

bufferedReader.read()TheTomcatinorg.apache.tomcat.util.net.NioBlockingSelector.read()Method of readingrequestMediumbodyProperties:

int keycount = 1; 
while(!timedout) {
    if (keycount > 0) { //only read if we were registered for a read
        read = socket.read(buf);
        if (read != 0) {
            break;
        }
    }
    try {
        if ( att.getReadLatch()==null || att.getReadLatch().getCount()==0) att.startReadLatch(1);
        poller.add(att,SelectionKey.OP_READ, reference);
        if (readTimeout < 0) {
            att.awaitReadLatch(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
        } else {
            att.awaitReadLatch(readTimeout, TimeUnit.MILLISECONDS);
        }
    } catch (InterruptedException ignore) {
        // Ignore
    }
}

Here becauseGETRequestedbodyIs empty, sosocket.read()Return to 0 and go toatt.awaitReadLatch(readTimeout, TimeUnit.MILLISECONDS);

protected void awaitLatch(CountDownLatch latch, long timeout, TimeUnit unit) throws InterruptedException {
    if ( latch == null ) throw new IllegalStateException("Latch cannot be null");
    latch.await(timeout,unit);
}

This will call theLockSuport.parkNanos(time)Interface until timeout, at this time you will still have doubts, whyHeaderIn theContent-LengthWill it take this logical link? Don’t worry, continue to look down, there are more wonderful analysis behind

Solution

  1. Service BThere is a problem with the cancellationjarPackage dependency
  2. Revision questionsjarIn the bagFilterConfiguration, judgment onlyPostRequest to readbodyattribute
  3. Interface caller add configuration if yesGETFilter out on requestContent-LengthAttributes (main reason)
  4. Modify the underlying dependency packageFeignInterceptor, determine the way the request is made, and thenHeaderAssignment (we are not easy to modify the packages that the company depends on at the bottom)

In fact, the most important thing to modify is scheme 4, which is a low-level package that the whole company will rely on. If it changes, it needs to inform the architecture group and so on, and the impact will be relatively large.

Finally, we first adopt scheme 3 to make some judgments and remove them in our request linkGETOn requestContent-LengthThe delivery of.

Solution principle

Next is the real principle, when the server sendsfeignAfter asking, I will goTomcatMediumorg.apache.coyote.http11.Http11Processor.prepareRequest()Method, code as shown in the figure:

[original] experience sharing: a bloody case triggered by content length

IfcontentLength >= 0, then aorg.apache.coyote.http11.filters.IdentityInputFilterClass, inService BAddedjarIn the bagRequestWrapperMediumbufferedReader.read()Will be called toorg.apache.coyote.http11.filters.IdentityInputFilter.doRead()method:

[original] experience sharing: a bloody case triggered by content length

This method will call theorg.apache.tomcat.util.net.NioBlockingSelector.read()Chinese:

[original] experience sharing: a bloody case triggered by content length

becauseGETRequestedrequest bodyIt’s empty, so it passes through heresocketTo read the return to 0, directly run the followingawaitReadLatch()Method, which is called hereLockSuport.parkNanos(time)Interface until the timeout, which is why we every timefeignThe reason why the request will time out.

However, if the service requester has configured theContent-LengthIs it empty? Here we will construct oneorg.apache.coyote.http11.filters.VoidInputFilterThe interceptor is constructed on top of itHttp11Processor.prepareRequest()The figure indicates:

[original] experience sharing: a bloody case triggered by content length

Obviously, it will return – 1 directly and will not be called againNioBlockingSelector.read()Therefore, it is the key to solve this problem successfully.

summary

There is not too much to introduce hereContent-LengthWe all know this concept of acquiescence. If you are not clear about it, you can also refer to:

https://blog.piaoruiqing.com/…

A simple oneContent-LengthIt really baffled me. The real reason for this problem is the nonstandard request. It took a lot of time to find out the problem, but it was worth it. One’s growth can’t be separated from the baptism of various problems. I hope you can get something after reading.

Welcome to:

[original] experience sharing: a bloody case triggered by content length