[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

Time:2019-12-10

Grape
All videos: https://segmentfault.com/a/11


Definition of HTTP

HTTP is based on client / server (C / s) architecture model. It exchanges information through a reliable link. It is a stateless request / response protocol.

Structure of HTTP

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

We currently use the most http1. X protocol. We are familiar with both the header and the body. So what is startline? Startline is what we call request line or status line
Get / HTTP / 1.1 or http / 1.1 200 OK.
Before describing the various ways HTTP works, let’s familiarize ourselves with the TCP / IP model:

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

Development of http:

http1.0

Http1.0:http1.0 does not have keep-alive by default. When the data request is advanced, the processing process below the application layer will be applied to the application layer. Here we only refer to the transport layer and application layer. In http1.0, every request will establish TCP connection (the three handshake), then HTTP request, so every time we interact with the server, we need to open a new connection. ! The request link of each link is shown in the following figure:

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

Imagine how much resources would be consumed by such a long process for each request.

http1.1

Based on this problem, we have HTTP version 1.1. What’s the change of version 1.1 compared with version 1.0?

  • HTTP added host field
  • In HTTP 1.1, chunked transfer coding is introduced to implement range request and breakpoint continuous transmission (in fact, block transmission coding is used by HTTP message header to transmit entity body block by block)
  • According to the theory of HTTP 1.1 pipelining, clients can send out multiple HTTP requests at the same time, instead of waiting for the response one by one

    • Note: This pipelining is only limited to theoretical scenarios. Most desktop browsers still choose to turn off HTTP pipelining by default!
    • So it is possible to open multiple TCP connections for applications using HTTP 1.1 protocol now!

HTTP1.1 gives the basic treatment based on the above-mentioned problem of resource consumption. What’s the meaning of default long link? Instead of making HTTP connections at every HTTP request, only one TCP link is established to process multiple requests. Of course, each request here is serial, that is, it is just not necessary to make a TCP connection, or it has to be queued, and this may cause the problem of line header blocking (for example, sending 100 requests, the first one being blocked, resulting in 99 requests behind). The default working mode of HTTP1.1 is shown in the following figure:

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

Now let’s imagine this model. What’s the disadvantage? Can it be optimized?
On top of that, we have raised two questions: 1. We need to queue; 2. It may cause the thread head to block. For the first problem, HTTP1.1 has already given a solution, namely pipline. At the beginning of the second problem, there is a transitional solution, namely spdy protocol (a protocol to enhance HTTP launched by Google, including data stream multiplexing, request priority and HTTP header compression. If you are interested, you can study it), and then to http2.0.
First, let’s talk about pipline. Pipline is a technology that can write multiple HTTP requests into the same socket without waiting for a response. Only HTTP1.1 supports HTTP pipelining, but 1.0 does not. What do you mean? We can see that the above figure is in the serial processing of a TCP connection, so when pipline is turned on, it will become the following:

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

We can see that sending HTTP requests is no longer to send first and then wait for response and then send the next request. In this way, we can see that all requests start in a unified way. But there is a problem. HTTP pipelining is actually to send multiple HTTP requests one by one in a TCP connection, and in the process of sending, it is not necessary to wait for the server’s response to the previous request; only not After that, the client still needs to receive the response in the order of sending the request! This leads to that although it solves the queuing problem, it only solves the problem of unilateral queuing, and finally accepts the data according to the order of the request. Why? Because they don’t know which is the first and which is the second. In this way, there will also be the problem of wire head blocking.
To sum up, when http1.0 is the assembly line, we complete tasks one by one. When HTTP1.1 is the time when our workers’ ability is improved, we can send out multiple work requirements at a time, but we haven’t mastered the skills, so we have to wait for all the work to arrive according to the regulations and deal with it one by one in order.

http2.0

Next is our http2.0 to see how it solves the previous problems. In http2.0, a stream structure is used to solve the problem of line head blocking, which can be solved by assigning a label, streamid, to each stream. So where is http2 sacred?
First of all, speaking of http2, we have to mention HTTPS. Http2 is a protocol based on HTTPS. For HTTPS, I found a better article, Wireshark grabs packets to understand the HTTPS request process.
At the beginning of the article, we compared the structure of http1 and http2. It seems that they are totally different, but in fact, they are not. Http2 takes frames as the minimum unit. Looking at the figure below, we can see that http2 only makes layer encapsulation, but in fact, it is still headers and body in essence. Only http2 is shown in more advanced functions and more ways.

[business learning] fundamentals of HTTP1.1 & 2.0 on May 9, 2019

http1.x vs http2.0

As for the advantages and disadvantages of http2, we have to start from the disadvantages of http1, because only when there is a contrast can it hurt.

  • The number of http1 connections is limited. For the same domain name, the browser can only create 6-8 TCP connections at the same time (different browsers are different). In order to solve the quantity limit, domain name segmentation technology has emerged, which is actually resource segmentation. Resources are placed under different domain names (such as secondary subdomain names), so that connections can be created and requests can be made for different domain names, which can be broken through in a convenient way. However, abuse of this technology will also cause many problems, such as each TCP connection itself needs to go through DNS query, three Step by step handshake, slow start, etc., also occupy extra CPU and memory, for the server, too many connections are easy to cause network congestion, traffic jams, etc. So, what does http2 do? Http2 adopts multiplexing technology. On a TCP connection, we can send frames to each other continuously. The stream ID of each frame indicates which stream this frame belongs to. Then, when the other party receives it, all frames of each stream are spliced according to the stream ID to form a whole block of data. Each request in http / 1.1 is treated as a stream, so many requests become multiple streams, the request response data is divided into multiple frames, and the frames in different streams are sent to each other alternately, which is the multiplexing in http / 2. At the same time, we know that the body length of http1 is brought in by the header, so if it is transmitted in the form of http2, there will be a problem, so http2 puts the body in the length field, and each stream has its own length. Finally, according to whether the head length of the stream is equal to the length of each stream, we can determine whether to package. At the same time, the problem of line head blocking is also solved. So the problem is coming again. How can we make sure there is no packet loss? Is there any disorder in the same stream? At this point, TCP can guarantee the order of packets and the packet will not be lost.
  • There is a lot of content in the header, and the header will not change too much in each request. There is no corresponding compression transmission optimization scheme. Http2 uses hpack algorithm to compress the length of the first part. The principle of hpack is to maintain the index space of a static index table and a dynamic index table. The principle of hpack is to match the index space of the current connection. If a key value already exists, the corresponding index is used to replace the first part of the entry. For example, “method: get” can match index 2 in the static index. Only transfer is required Input a byte containing 2; if it does not exist in the index space, it will be transmitted by character encoding. Character encoding can choose Huffman encoding, and then judge whether it needs to be stored in the dynamic index table according to the situation, which saves a lot of space.
  • Plaintext transmission is not secure. Http1 uses clear text transmission, which is not secure. Then http2 uses binary frame layer to solve this problem. Frame is the smallest unit of data transmission. Binary transmission replaces the original plaintext transmission, and the original message is divided into smaller data frames.
  • In order to reduce the number of requests as much as possible, we need to do some optimization work such as merging files, Sprite diagrams, resource inlining, etc., but this undoubtedly causes the problem that the content of a single request becomes larger and the latency becomes higher, and the embedded resources cannot effectively use the caching mechanism. In this case, http2 has launched the server push, the browser sends a request, and the server actively pushes the resources related to the request to the browser, so the browser does not need to initiate subsequent requests, which is mainly for the optimization of resource inlining.
  • For HTTP / 1, the reset connection of the application layer is notified to the opposite end to close the connection by setting the reset flag in the TCP segment. In this way, the connection will be directly disconnected, and the connection must be reestablished the next time the request is sent. Http / 2 introduces the frame of RST ﹣ stream type, which can cancel the stream of a request on the premise of continuous connection and perform better.

extend

We have said so much about http. If you want to know more about it, you can use Wireshark to grab the packets. Two better tools are recommended: Wireshark and Charles.

Reference article:

  • Analysis of HTTP protocol (2)
  • HTTP2 detailed explanation