All my friends who are familiar with me know that I have a PDF of HTTP core summary. This PDF is a summary of my HTTP series articles. However, I wrote HTTP related content a year ago. I looked back at this PDF. Although there are many contents, many of them lack systematicness and look uncomfortable. This is contrary to my original intention, So I’m going to remake the HTTP protocol. HTTP protocol is too important for our programmers. No matter which language you use, HTTP is the key you need to know.
This is not an article that briefly introduces the basic concepts of HTTP. If you are not familiar with the basic concepts of HTTP, it is recommended that you read the basic HTTP articles written by cxuan-After reading this article, it’s no problem to argue with the interviewer
So let’s assume that you have a certain understanding of HTTP.
Let’s start this article.
HTTP enabled TCP
As we all know, HTTP, an application layer protocol, transmits data based on TCP. When you want to access a resource (a resource is a URL in the network), you need to first analyze the IP address and port number of the resource, so as to establish a TCP connection with the server where the IP and port number are located, and then the HTTP client sends a service request (get) message. The server responds to the request message of the server. When there is no need to exchange messages, The client will close the connection. I use the figure below to illustrate this process.
The above figure well illustrates the whole process of HTTP from establishing a connection – > initiating a request message – > closing a connection, but the above process also ignores a very important point, that isTCP connection establishment process。
To establish a TCP connection, you need to shake hands three times and exchange three messages. I believe everyone knows this process well. If you don’t know the process of establishing a TCP connection, you can read cxuan’s article firstTCP connection management。
Because HTTP is located at the upper layer of TCP, the timeliness (performance) of HTTP request – > response process largely depends on the performance of the underlying TCP. Only after understanding the performance of TCP connection can we better understand the performance of HTTP connection, so as to realize high-performance HTTP applications.
We usually call a complete request – > corresponding process HTTP transaction.
So I usually write it as HTTP transaction later. Just understand what’s going on.
Our next focus should start with the performance of TCP.
HTTP delay loss
Let’s review the above HTTP transaction processes. Which processes do you think will cause HTTP transaction delay? As shown in the figure below
As can be seen from the figure, the following factors mainly affect the delay of HTTP transactions
- The client will determine the IP and port number of the server according to the URL. This is mainly the delay of DNS converting the domain name into IP address. DNS will initiate DNS query to query the IP address of the server.
- The second delay is that when TCP establishes a connection, the client will send the connection request message to the server and wait for the server to return the response message. Each new TCP connection has an establishment delay.
- Once the connection is established, the client will request data from the server. This delay is mainly the delay that the server reads the request message from the TCP connection and processes the request.
- The delay of the server transmitting the response message to the client.
- The last delay is the delay of TCP connection closing.
The optimization of the last point is also a focus of this paper.
HTTP connection management
Imagine a problem. Suppose a page has five resources (elements). Each resource requires the client to open a TCP connection, obtain resources and disconnect, and each connection is opened serially, as shown in the following figure:
Serial means that these five connections must be in order, and there will be no case where more than two connections are opened at the same time.
For the above five resources, you need to open five connections. It’s good to say that the CPU can handle fewer resources. What if the page resources reach hundreds or more? Do you need to open a separate connection for each resource? Obviously, this will sharply increase the processing pressure of the CPU and cause a lot of delay, which is obviously unnecessary.
Another disadvantage of serialization is that some browsers cannot know the size of objects before they are loaded, and browsers need object size information to put them in a reasonable position on the screen. Therefore, the screen will not display any content until enough objects are loaded, which leads to the fact that objects are always loaded, But we thought the browserGet stuck。
So, is there a way to optimize HTTP performance? That’s a good question, of course.
This is the most common and easy to think of connection method. HTTP allows the client to open multiple connections and execute multiple HTTP transactions in parallel. After adding parallel connections, the request process of the whole HTTP transaction is like this.
Using parallel connection will overcome the no-load time and bandwidth limitation of a single connection. Because each transaction has a connection, the delay can overlap, which will improve the page loading speed.
However, the parallel connection is not necessarily fast. If the bandwidth is insufficient, even the page response speed is not as fast as the serial connection, because in the parallel connection, each connection will compete to use the effective bandwidth, and each object will load at a slower speed. It is possible that connection 1 loads 95%, connection 2 occupies 80% of the bandwidth, and connection 3, Connection 4…… Although each object is loading, there is no response on the page.
Moreover, opening a large number of connections will consume a lot of memory resources, resulting in performance problems. There are only five connections discussed above. This is relatively few. Complex web pages may have dozens or even hundreds of embedded objects, that is, clients can open hundreds of connections, and many clients send applications at the same time, This can easily become a performance bottleneck.
In this way, parallel connection is not necessarily “fast”. In fact, parallel connection does not speed up the transmission speed of pages, and parallel connection only creates a problem
Illusion, this is the common problem of all parallelism.
Web clients usually open connections to the same site, and applications that have initiated requests to a server are likely to make more requests to the server in the near future, such as obtaining more pictures. This characteristic is called
Therefore, HTTP 1.1 and HTTP 1.1 0 allows HTTP to continue the connection after executing a transaction
Open status, this open state actually refers to the open state of TCP, so that the next HTTP transaction can reuse this connection.
A TCP connection that remains open after an HTTP transaction is called
Non persistent connections will be closed after the end of each transaction. In contrast, persistent connections will remain open after the end of each transaction. Persistent connections remain open between transactions until the client or server decides to close them.
Long connections also have disadvantages. If the number of requests initiated by a single client is not very frequent, but there are many connected clients, the server will crash sooner or later.
There are generally two types of persistent connections. One is HTTP 1.0+
keep-alive； One is HTTP 1.1+
The default connection before HTTP 1.1 isNon persistent connection, if you want to use a persistent connection on the old version of HTTP, you need to specify the value of connection as keep alive.
HTTP version 1.1 is a persistent connection. If you want to disconnect, you need to specify the value of connection as close, which is also the version factor of the two selection methods mentioned above.
The following is a comparison between the HTTP transaction after using the persistent connection and the serial HTTP transaction connection
This figure compares the time loss of HTTP transactions on serial connection and persistent connection. It can be seen that HTTP persistent connection is omittedConnection open – connection closedTherefore, the time loss is reduced.
Another interesting aspect of persistent connections is the connection option, which is aGeneral optionsThat is, both the client and the server have a header. The following is a request response diagram of the client and the server with persistent connections
As can be seen from this figure, the connection header is mainly used for persistent connections, which means that connection is the implementation of persistent connections. So let’s mainly discuss the big man connection.
The connection header serves two purposes
- Use with upgrade for protocol upgrade
- Manage persistent connections
Use with upgrade for protocol upgrade
HTTP provides a special mechanism that allows an established connection to be upgraded to a new protocol. It is generally written as follows
GET /index.html HTTP/1.1 Host: www.example.com Connection: upgrade Upgrade: example/1, foo/2
Http / 2 explicitly prohibits the use of this mechanism, which only belongs to http / 1.1
In other words, when the client initiates connection: upgrade, it indicates that this is a connection upgrade request. If the server decides to upgrade this connection, it will return a 101 switching protocols response status code and a header field upgrade of the protocol to be switched to. If the server does not (or cannot) upgrade this connection, it will ignore the upgrade header field sent by the client and return a normal response: for example, 200.
Manage persistent connections
As mentioned above, there are two ways of persistent connection. One is HTTP 1.0+
Keep-Alive； One is HTTP 1.1+
Connection: Keep-Alive Keep-Alive: timeout=10,max=500
In HTTP 1.0 + keep alive, the client can keep a connection open by including a connection: keep alive header request.
Here we need to pay attention ⚠️ One point: the keep alive header only keeps the request active. After a keep alive request is issued, the client and server do not necessarily agree to a keep alive session. They can close idle keep alive connections at any time, and the client and server can limit the number of transactions processed by keep alive connections.
The keep alive header has the following options:
timeout: this parameter estimates how long the server wants to keep the connection active.
max: this parameter is followed by the timeout parameter, which indicates how many transactions the server can open persistent connections for.
The keep alive header is optional, but it can only be used when connection: keep alive is provided.
There are certain restrictions on the use of keep alive. Let’s discuss the restrictions on the use of keep alive.
Keep alive usage restrictions and rules
- In http / 1.0, keep alive is not used by default. The client must send a connection: keep alive request header to activate the keep alive connection.
- By detecting whether the response contains the connection: keep alive header field, the client can determine whether the server closes the connection after sending the response.
- The agent and network management must implement the connection header rule. They must delete the header field in the connection header and the connection header itself before forwarding the leopard print or caching, because connection is a
Hop-by-HopFirst, this first saysYes, it is only valid for a single forwarding, and it will be invalid because it is forwarded to the cache / proxy server。
- Strictly speaking, you should not establish a keep alive connection with a proxy server that cannot determine whether it supports the connection header to prevent problems
Dumb agentProblem, dumb agent problem, we’ll talk about it next.
Keep alive and dumb agent problem
Here I’ll explain what a proxy server is first, and then I’ll talk about the dumb proxy problem.
What is a proxy server?
Proxy server is a medium to obtain network information instead of the client, which is more popularTransfer station of network information。
Why do we need a proxy server?
One of the most widely used is that we need to use a proxy server to access some websites that our clients can’t access directly. In addition, the proxy server has many functions, such as caching, which can reduce costs and save bandwidth; For the real-time monitoring and filtering of information, the proxy server is also a client relative to the target server (the server that finally obtains the information). It can obtain the information provided by the server. Compared with the client, the proxy server is a server that determines what information to provide to the client, so as to achieve the function of monitoring and filtering.
The dumb proxy problem appears on the proxy server. More specifically, it appears on the proxy serverThe proxy server of the connection header is not recognized, and it is not known that the proxy server of the connection header will be deleted after the request is made。
Suppose a web client is talking to the web server through a dummy proxy server, as shown in the following figure
Let’s explain the picture above
- First, the web client sends a message to the proxy, which contains the connection: keep alive header. It hopes to remain active after this HTTP transaction. Then the client waits for a response and determines whether the other Party allows a persistent connection.
- The dumb proxy (it’s inappropriate to define it as a dumb proxy here. We often look at it first, and then characterize it. Now the server has not made a dumb proxy behavior, so we characterize it) received this HTTP request, but it doesn’t understand the connection header, and it doesn’t know what keep alive means, Therefore, the message is only sent to the server along the forwarding link, but the connection header is a hop by hop header, which is only applicable to single link transmission. Therefore, the proxy server should not send it to the server, but it still sends it, and some difficult things will happen later.
- After the forwarded HTTP request arrives at the server, it will mistakenly think that the other party wants to maintain a keep alive persistent connection. After evaluation, the server responds and agrees to have a keep alive conversation, so it returns a connection: keep alive response and arrives at the dumb proxy server.
- The dumb proxy server will directly send the response to the client. After receiving the response, the client will know that the server can use a persistent connection. However, at this time, both the client and the server know to use keep alive persistent connection, but the dumb proxy server knows nothing about keep alive.
- Because the proxy knows nothing about keep alive, all the data it receives will be sent to the client and wait for the server to close the connection. However, the proxy server thinks it should remain open, so it will not close the connection. In this way, the dumb proxy server is always hanging there waiting for the connection to close.
- After the client sends the next HTTP transaction, the dumb proxy will directly ignore the new HTTP transaction, because it does not think that there will be other requests on a connection, so it will directly ignore the new request.
This is the dumb agent of keep alive。
So how to solve this problem? Using proxy connection
Proxy connection resolves dumb proxy
Netscape has proposed a method to use the proxy connection header. First, the browser will send the proxy connection extension header to the proxy instead of the officially supported connection header. If the proxy server is a dummy proxy, it will directly send the proxy connection to the server. If the server receives the proxy connection, it will ignore this header, which will not cause any problems. If a smart proxy server receives a proxy connection, it will directly replace the proxy connection with the connection and send it to the server.
Http / 1.1 persistent connection
Http / 1.1 gradually stopped supporting keep alive connections, using a method called
persistent connectionThe improved design replaces keep alive, which is also a persistent connection, but it is better than the working mechanism of HTTP / 1.0.
Unlike the keep alive connection of HTTP / 1.0, http / 1.1 uses a persistent connection by default. Unless otherwise specified, http / 1.1 assumes that all connections are persistent. If you want to close the connection after the transaction, you need to add a connection: close header in the message. This is an important difference from previous versions of the HTTP protocol.
There are also some restrictions and rules for using persistent connection
- First, after sending the connection: close request, the client cannot send more requests on this connection. It can also be said that if the client does not want to send other requests, it can use connection: close to close the connection.
- The proxy of HTTP / 1.1 must be able to manage the persistent connections of the client and server respectively. Each persistent connection is only applicable to a single transmission.
- The client should only maintain two persistent connections to any server or agent to prevent server overload.
- Only the length of the solid part and the corresponding
Content-LengthThe connection can only be maintained for a long time when it is consistent, or when block transmission coding is used.
Http / 1.1 allows use on persistent connectionsRequest pipeline。 This is another performance optimization relative to keep alive connections. The pipeline is a carrier carrying HTTP requests. We can put multiple HTTP requests into the pipeline, which can reduce the loopback time of the network and improve the performance. The following figure is a schematic diagram of serial connection, parallel connection and piped connection:
The use of piped connections also has several limitations:
- If the HTTP client cannot confirm that the connection is persistent, the pipeline should not be used.
- The HTTP response must be returned in the same order as the request, because HTTP has no concept of sequence number, so once the response is out of order, there is no way to match it with the request.
- The HTTP client must be ready to close the connection at any time and to resend all outstanding pipelined requests.
HTTP close connection
All HTTP clients, servers or proxies can close an HTTP transport connection at any time. Usually, the connection will be closed after a response, but it will also occur in the process of HTTP transactions.
However, the server cannot determine whether the client has data to send at the moment of shutdown. If this happens, the client will have a write error during data transmission.
Even if there is no error, the connection can be closed at any time. If the connection is closed during transaction transmission, you need to reopen the connection and try again. If it is a single connection, it is OK to say. If it is a pipelined connection, it is worse, because a large number of connections will be lost in the pipeline. At this time, if the server is shut down, a large number of connections will not respond and need to be rescheduled.
If an HTTP transaction is executed once or N times, the result is always the same, then we consider the transaction to be
idempotentYes, generalGet, head, put, delete, trace, and optionsMethods are considered idempotent. Clients should not send any messages in a pipelined mannerNon idempotent requestFor example, post, otherwise it will cause uncertain consequences.
Because HTTP uses TCP as the transport layer protocol, HTTP closing the connection is actually the process of TCP closing the connection.
There are three situations in which the HTTP connection is closed:Fully closed, semi closed and normally closed。
An application can turn off either of the TCP input and output channels, or both. Calling the socket close () method will close the input and output at the same time, which is calledFully closed。 You can also call the shutdown method of the socket to turn off the input or output channel separately, which is calledSemi closed。 The HTTP specification recommends that when the client and server suddenly need to close the connection, they shouldNormal shutdownBut it doesn’t say how to do it.
For an in-depth study of some TCP shutdown problems, you can read another article by cxuanTCP Basics
In addition,My own liver has six PDF, and the whole network has spread over 10w+. After searching for the official account, WeChat search programmer cxuan replied to cxuan in the background and received all PDF, these PDF are as follows