Interesting! One line of code can’t get the full URL of the request

Time:2021-6-7

From the official account:Gopher points north

origin

When doing web services, there may be such a business scenario to obtain the complete URL of an HTTP request. Coincidentally, Lao Xu encountered such a business scenario. Facing such a simple demand, CV Dafa has no chance to show its talents. Click, click, get the complete URL code of the request.

Interesting! One line of code can't get the full URL of the request

At that time, it was only one step away from the verification. Lao Xu was full of confidence. Soon, the face slapping came quickly, like a Tornado…

Interesting! One line of code can't get the full URL of the request

As you can see from the picture,req.URLInSchemeandHostAll are empty, sor.URL.String()Unable to get full request connection. This result made Lao Xu excited. I never thought that one day I would have a chance to find the possible missing assignments in the go source code. Lao Xu forced to bear the excitement in his heart and prepared to study it carefully. What if he became a go contributor^ ω^。 Finally, we found that there was no problem with the official implementation, so we have today’s article.

Why can’t I get a complete connection in HTTP 1.1

HTTP 1.1 server read request and buildRequest.URLThe logic of the object is in thereadRequestMethod, the following old Xu to do a simple analysis of its source code summary.

  1. Read the first line of the request. The first line of the HTTP request is also called the request line.
// First line: GET /index.html HTTP/1.0
var s string
if s, err = tp.ReadLine(); err != nil {
    return nil, err
}
  1. The contents of the request line are parsed asreq.Methodreq.RequestURIandreq.Proto
var ok bool
req.Method, req.RequestURI, req.Proto, ok = parseRequestLine(s)
  1. takereq.RequestURIIt can be interpreted asreq.URL
rawurl := req.RequestURI
if req.URL, err = url.ParseRequestURI(rawurl); err != nil {
    return nil, err
}

Note: when the request method is connect, the above process changes slightly

Through the above process, we know thatreq.URLThe source of the data isreq.RequestURI, andreq.RequestURILet’s continue to read the following.

Request resources

According to the definition of rfc7230, the request line is divided into request method, request resource and HTTP version, corresponding to the abovereq.Methodreq.RequestURIandreq.ProtoRequest target is translated as request resource in this article.

Interesting! One line of code can't get the full URL of the request

As for the request methods, I don’t think you need to popularize science here. As for the commonly used HTTP versions, they are nothing more than HTTP 1.1 and HTTP 2. The following mainly introduces several forms of request resources.

origin-form

This form is the most common form of request resources, and its format is defined as follows.

origin-form    = absolute-path [ "?" query ]

When a request is made directly to the server, only path and query are allowed to be sent as request resources except for connect and options requests. If the path of the request link is empty, it must be sent/As a request resource. The host information in the request link is sent in the form of header.

withhttp://www.example.org/where?q=nowFor example, the request line and host request header information are as follows

GET /where?q=now HTTP/1.1
Host: www.example.org

absolute-form

This form is currently only used when making a request to the agent, and its format is defined as follows.

absolute-form  = absolute-URI

According to the definition in rfc7230, the client will only send this form of request resources to the proxy. However, in order to convert a certain HTTP version to this form of request resources in the future, the server needs to support this form of request resources. That’s probably whyreq.URLThe reason why most of the field values in the URL are empty but the parts of the URL are still completely defined.

Oneabsolute-formAn example of a request line in the form of.

GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1

authority-form

authority-formRequest resources in the form ofCONNECTThe format of the request is defined as follows.

authority-form = authority

send outCONNECTWhen requesting, the client can only send the authority part of the URI (excluding userinfo and @ delimiter) as the request resource. This is more abstract. Let’s take a look at it firsthttp-URIWhat is the definition of.

Interesting! One line of code can't get the full URL of the request

You can probably guess from the picture aboveauthorityIt should refer to the host information. Very Good! You’re not wrong!

The origin server for an "http" URI is identified by the authority component, which includes a host identifier and optional TCP port.

The above is the explanation of authority by rfc7230. According to his own translation, Lao Xu made a unilateral announcement hereauthorityIncludes host identifier and optional port information. Oneauthority-formAn example of a request line in the form of.

CONNECT www.example.com:80 HTTP/1.1

asterisk-form

asterisk-formRequest resources in the form ofOPTIONSRequest and can only be*The format is defined as follows.

asterisk-form  = "*"

Oneasterisk-formAn example of a request line in the form of.

OPTIONS * HTTP/1.1

After learning about the above forms of request resources, let’s go back to the problem of getting the complete URL of the request. With the most commonly usedabsolute-formFor example (we hardly need to consider other forms of request resources in development), there is a lack of request resourcesHostandSchemeSo one line of code can’t get the complete URL of the request. Can’t we get the full URL of the request? Of course not, we can also get the complete URL through the following two schemes.

Scheme 1

  1. adoptreq.HostGet the information about the host.
  2. Ifreq.TLS == nilThen it is an HTTP request, otherwise it is an HTTPS request.
  3. Through step 1, step 2 and combining with the request line information, we can get the complete URL.

Scheme 2
When the host information of the service is configured in the configuration file, you only need to read and splice the configuration file to obtain the complete requestreq.RequestURIThat’s it. In fact, Lao Xu adopted scheme 2, because many services are behind the gateway. When the client uses HTTPS to request the gateway and the gateway uses HTTP to request the servicereq.TLS == nilThe judgment is unreasonable.

Why can’t I get a complete connection in http2

It should be noted that in http2, there is no concept of request line. Instead, there is request pseudo headerAnalysis of go initiated HTTP 2.0 request process (Part 2) — header compressionIt’s mentioned in this article.

The following figure shows part of the header information of an HTTP 2 request.

Interesting! One line of code can't get the full URL of the request

As you can see from the figure, the request line in HTTP 1.1 is gone. According to the definition in rfc7540, the pseudo header fields of the request are:method:scheme:authorityand:path

:methodand:schemeYou don’t need to be a lot older. Just look at the meaning of English words.

:authority: as explained earlier, the values are the host identifier and optional port information. In addition, it should be noted that http2 does not haveHostRequest header.

:path: If yesOPTIONSRequest, the value is*. In other cases, the value is path and query of the request URI. If path is empty, the value is/

Now that we have a basic understanding of the pseudo header of an HTTP 2 request, let’s take a lookRequest.URLThe process of assignment. Http2 server read request and buildRequest.URLObject logic in H2_ Application of bundle.go file(*http2serverConn).newWriterAndRequestNoBodyMethod.

  1. If it isCONNECTRequest for approval:authoritystructureurl_Otherwise, it passes:pathstructureurl_
if rp.method == "CONNECT" {
    url_ = &url.URL{Host: rp.authority}
    requestURI = rp.authority // mimic HTTP/1 server behavior
} else {
    var err error
    url_, err = url.ParseRequestURI(rp.path)
    if err != nil {
        return nil, nil, http2streamError(st.id, http2ErrCodeProtocol)
    }
    requestURI = rp.path
}
  1. takeurl_Assign toreq.URL
req := &Request{
    Method:     rp.method,
    URL:        url_,
    RemoteAddr: sc.remoteAddrStr,
    Header:     rp.header,
    RequestURI: requestURI,
    Proto:      "HTTP/2.0",
    ProtoMajor: 2,
    ProtoMinor: 0,
    TLS:        tlsState,
    Host:       rp.authority,
    Body:       body,
    Trailer:    trailer,
}

because:pathThe value of the header does not contain host information, so the server of http2 cannot passreq.URL.String()Get the full URL of the request.

Here we reflect on a problem. The full URL can be obtained through the pseudo header field. Why is it still read only:pathand:authorityTo assign a valuereq.URLWhat about it?

Lao Xu speculated here that the possible reason was that he hoped that developers would not care whether the request was HTTP 1.1 or HTTP 2, so as to avoid unnecessary HTTP version judgment.

That’s all for thinking about getting the full URL of the request. Finally, I sincerely hope that this article can be helpful to all readers.

notes

  1. At the time of writing this article, the version of go used by the author is go1.15.2

reference resources:

https://tools.ietf.org/html/r…

https://tools.ietf.org/html/r…

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]