HTTP that the front end should know

Time:2019-10-21

As a veteran of Internet communication protocol, HTTP protocol has gone through three changes. Now the latest version is HTTP 2.0, which we believe is familiar with. Today, I’d like to introduce HTTP to you.

1. History of HTTP

A brief introduction first, and then a detailed explanation later

1.1、HTTP/0.9

The earliest version of HTTP was born in 1991. Compared with the current version, it is extremely simple. There is no HTTP header, no status code or even version number. Later, its version number was set as 0.9 to distinguish it from other versions of HTTP. Http / 0.9 only supports one method – get, and only one request line.

GET /hello.html

The response is also very simple, including only the HTML document itself.

<HTML>
Hello world
</HTML>

When a TCP connection is established, the server returns an HTML string to the client. After sending, close the TCP connection. Because there is no status code and error code, if there is an error when the server processes, only a special HTML file containing problem description information will be returned. This is the earliest version of HTTP / 0.9.

1.2、HTTP/1.0

In 1996, the HTTP / 1.0 version was released, which greatly enriched the HTTP transmission content. In addition to text, it can also send pictures, videos, etc., which laid the foundation for the development of the Internet. Compared with HTTP / 0.9, http / 1.0 has the following features:

  • The request and response support HTTP headers and add status codes. The response object starts with a response status line.
  • The protocol version information needs to be sent along with the request. It supports the head and post methods.

A typical http / 1.0 request is like this:

GET /hello.html HTTP/1.0
User-Agent:NCSA_Mosaic/2.0(Windows3.1)

200 OK
Date: Tue, 15 Nov 1996 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html

<HTML>
A page with pictures
<IMGSRC="/smile.gif">
</HTML>

1.3、HTTP/1.1

A few months after http / 1.0 was released, http / 1.1 was released. Http / 1.1 is more a perfection of HTTP / 1.0. In HTTP 1.1, it mainly has the following improvements:

  • Connection can be reused
  • Add pipeline
  • Chunked code transmission
  • Introduce more cache control mechanisms
  • Introduce content negotiation mechanism
  • Both request and response messages support the host header domain
  • Added options, put, delete, trace, connect methods

1.4、 HTTPS

HTTPS is an HTTP channel with security as its goal. In short, it is the Security version of HTTP. That is to say, SSL layer is added under http. The security foundation of HTTPS is SSL. Therefore, SSL is required for the detailed content of encryption.

HTTPS protocol can be divided into two main functions: one is to establish an information security channel to ensure the security of data transmission; the other is to confirm the authenticity of the website. The differences between HTTPS and HTTP are as follows:

  • HTTPS protocol uses CA to apply for certificate, because there are few free certificates, it needs some fees.
  • HTTP is a clear text transmission, and HTTPS is a secure SSL encrypted transmission protocol.
  • HTTP and HTTPS use totally different connection methods and different ports. The former is 80 and the latter is 443.

1.5、SPDY

From 2010 to 2015, Google proved an alternative way to exchange data between client and server by implementing an experimental spdy protocol. It collects the focus problems of browser and server developers, clarifies the increase of response quantity and solves the complex data transmission. When you start spdy, the default goal is:

  • Page load time (PLT) reduced by 50%.
  • There is no need for the site author to modify anything.
  • Minimize deployment complexity without changing network infrastructure.
  • Work with the open source community to develop this new agreement.
  • Collect real performance data to verify the validity of the experimental protocol.

In order to achieve the goal of reducing page load time and reducing page load time, spdy introduces a new binary framing data layer to realize multi-directional request and response, priority, minimize and eliminate unnecessary network delay, so as to make more effective use of the underlying TCP connection.

1.6、 HTTP/2.0

Time came to 2015, http / 2.0 came out. Let’s first introduce the features of HTTP / 2.0:

  • Use binary framing layer
  • Multiplexing
  • Data flow priority
  • Server push
  • Head compression

2. Detailed explanation of HTTP principle

The HTTP protocol is built on the TCP / IP protocol and is a subset of the TCP / IP protocol. Therefore, to understand the HTTP protocol, it is necessary to first understand the relevant knowledge of the TCP / IP protocol.

2.1 TCP / IP protocol

TCP / IP protocol family is a system composed of four layers of protocols: application layer, transmission layer, network layer and data link layer.
HTTP that the front end should know

The advantage of layering is to decouple each relatively independent function, and to communicate with each other through specified interfaces. If you need to modify or rewrite the implementation of one layer in the future, as long as the interface remains unchanged, it will not affect the functions of other layers. Next, we will introduce the main functions of each layer.
1) application layer
The application layer is generally the application we write, which determines the application services provided to users. The application layer can communicate with the transport layer through system call.
There are many protocols in the application layer, such as FTP (File Transfer Protocol), DNS (domain name system) and HTTP (Hypertext Transfer Protocol) discussed in this chapter.
2) transport layer
The transmission layer provides data transmission function between two computers in the network connection to the application layer through system call.
There are two different protocols in the transport layer: TCP (transmission control protocol) and UDP (user data protocol).
3) network layer
The network layer is used to deal with the packets flowing on the network. The packets are the smallest data unit of network transmission. This layer specifies the path (transmission route) through which to reach the other party’s computer and transmit the data packets to the other party. IP protocol
4) link layer
The link layer is used to process the hardware part of the connected network, including the control operating system, hardware device driver, NIC (network interface card), optical fiber and other physical visible parts. The scope of hardware is within the scope of link layer.

Packet encapsulation
How does the upper layer protocol data change into the lower layer protocol data? This is achieved by encapsulating. Application data is passed down the protocol stack before it is sent to the physical network. Each layer will add its own header information (link layer will also add tail information) on the basis of the upper layer protocol data to provide necessary information for the realization of the functions of this layer.
HTTP that the front end should know
When the sender sends data, the data will be transmitted from the upper layer to the lower layer, and each layer will be marked with the header information of that layer. When receiving data, the data will be transmitted from the lower layer to the upper layer, and the header information of the lower layer will be deleted before transmission.

Because the header information of the lower layer protocol has no practical use for the upper layer protocol, the header information of the upper layer will be removed when the lower layer protocol transmits data to the upper layer protocol. This encapsulation process is completely transparent for the upper layer protocol. The advantage of this is that the application layer only needs to care about the implementation of application services, not the underlying implementation.

TCP triple handshake
From the above introduction, we can see that there are two transport layer protocols: TCP protocol and UDP protocol. Compared with UDP, TCP provides connection oriented, byte stream and reliable transmission.

HTTP that the front end should know

  • First handshake: the client sends the connection request message segment with syn flag, and then enters syn end state, waiting for the confirmation of the server.
  • Second handshake: after receiving the syn message segment of the client, the server needs to send ack information to confirm the syn message segment. At the same time, send your own syn request information. The server will put the above information into a message segment (syn + ACK message segment) and send it to the client. At this time, the server will enter the syn? Recv state.
  • The third Handshake: after receiving the syn + ACK message segment of the server, the client will send the ACK confirmation message segment to the server. After the message segment is sent, both the client and the server will enter the established state to complete the TCP three handshake.

When the three handshakes are completed, the TCP protocol will maintain the connection status for both parties. In order to ensure the success of data transmission, the receiving end must send ACK message as confirmation after receiving the data packet. If the sender does not receive the ACK message from the receiver within the specified time (this time is called the retransmission timeout), the overtime data will be retransmitted.

2.2 DNS domain name resolution

When you enter https://juejin.im in the address bar of the browser, what will happen? You must have a general idea in mind. Here I will go through the steps of DNS domain name resolution in detail. Before I talk about concepts, I’ll put a classic picture and text for you to think about for a minute.
HTTP that the front end should know
The specific process of finding the IP address corresponding to the domain name

  1. The browser searches its own DNS cache (the browser maintains a corresponding table of domain name and IP address); if it fails to hit, go to the next step;
  2. Search the DNS cache in the operating system; if there is no hit, go to the next step;
  3. Search the hosts file of the operating system (under Windows environment, maintain a corresponding table of domain name and IP address); if there is no hit, go to the next step;
  4. List item

    • The operating system sends the domain name to ldns (local zone name server). Ldns queries its own DNS cache (generally, the hit rate is about 80%). If the search succeeds, the result will be returned. If the search fails, an iterative DNS resolution request will be initiated:
    • Ldns initiates a request to root name server (the address of the top-level domain name server of the root domain name server, such as com, net, Im, etc.), where root name server returns the address of the top-level domain name server of the im domain;
    • Ldns sends a request to the top-level domain name server of the im domain and returns the address of the juejin.im domain name server.
    • Ldns sends a request to the juejin.im domain name server to get the IP address of juejin.im;
    • Ldns returns the IP address to the operating system, and caches the IP address by itself; the operating system returns the IP address to the browser, and caches the IP address by itself.

The simple process of HTTP work

  • Address resolution: this step is more important than the above DNS resolution
  • Encapsulate HTTP request packet: encapsulate the above parts into an HTTP request packet by combining the local information.
  • Encapsulate into a TCP packet to establish a TCP connection (TCP triple handshake)
  • Client sends request command
  • Server response
  • Server closes TCP connection

2.3 HTTP request method

Some common HTTP request methods.

  • Get: for getting data
  • Post: used to commit an entity to a specified resource, usually resulting in a change of state or side effects on the server
  • Head: same response as get request, but no response body
  • Put: used to create or update specified resources
  • Delete: delete the specified resource

Some differences between get and post. See another article of my interview classic: the difference between get and post in http

2.4 HTTP cache

HTTP also has its caching mechanism. For the content of this part, you can take a look at my previous article browser cache to see this one. I won’t go over it here.

2.5 status code

Here are some common status codes

1. 301 permanent transfer
When you want to change the domain name, you can use 301. For example, the previous domain name is www.renfed.com, and then you change a new domain name, fed.renren.com. You want users to automatically jump to the new domain name when they visit the old domain name. Then you can use nginx to return 301:

server {
    listen       80;
    server_name  www.renfed.com;
    root         /home/fed/wordpress;
    return       301 https://fed.renren.com$request_uri;
}

When the browser receives 301, it will jump automatically. If the search engine finds that it is 301 when crawling, it will change the domain name of the web page previously included in it after a few days.

In another scenario, if you want to access HTTP and automatically jump to HTTPS, you can use 301, because if you directly enter the domain name in the browser address bar and press enter, and there is no HTTPS in front, then it is the default HTTP protocol. At this time, we hope that users can access the safe HTTPS, do not access HTTP, so we need to do a redirection, you can also use 301. For example,

server {
    listen       80; 
    server_name  fed.renren.com;

    if ($scheme != "https") {
         return 301 https://$host$request_uri;
    }   
}

2. Temporary transfer of 302 found resources
Many short links jump to long links is the 302 used, as shown in the following figure:
HTTP that the front end should know
3. 304 not modified
This mainly occurs in the cache above. If the server is not modified. Will use the browser’s cache.

HTTP that the front end should know
4. The 400 bad request request is invalid.
When the necessary parameters are missing and the format of the parameters is wrong, the backend usually returns 400, as shown in the following figure:
HTTP that the front end should know

5. 403 Forbidden denial of service
The service understands your request, including the correct parameters, but refuses to provide the service. For example, a service allows direct access to static files, but does not allow access to a directory:
HTTP that the front end should know
Otherwise, the files on your server will be at a glance.
The difference between 403 and 401 is that 401 has no authentication or login authentication.

6. 500 internal server error
If there is an exception in the business code and it is caught by tomcat, an error of 500 will be returned:
HTTP that the front end should know
For example: the length of database field is limited to 30 characters. If a record of 31 characters is inserted directly without judgment, the database will throw an exception. If the exception is not caught, it will directly return 500.

When the service is completely suspended and there is no return, it is 502.

7. 502 bad gateway error
HTTP that the front end should know
This situation is because nginx received the request, but the request was not typed. It may be because the business service hung up, or the port number was typed incorrectly.

8. 504 gateway timeout
It is usually because the service processes the request too long, resulting in a timeout. For example, the default maximum processing time of the PHP service is 30s. If it exceeds 30s, it will hang up and return 504, as shown in the following figure:
HTTP that the front end should know

2.6 basic optimization of HTTP

There are two main factors that affect an HTTP network request:bandwidthanddelay

  • bandwidth
    If we are still in the stage of dial-up Internet access, bandwidth may become a more serious problem affecting requests, but now the network infrastructure has greatly improved the bandwidth, we will no longer worry about the impact of bandwidth on the network speed, so there is only delay.
  • delay
    1. Hol blocking: the browser will block requests for some reasons. For the same domain name, the browser can only have four connections at the same time (this may vary depending on the browser kernel). If the maximum number of connections exceeds the limit, subsequent requests will be blocked.
    2. DNS lookup: the browser needs to know the IP address of the target server to establish a connection. DNS is the system that resolves domain names to IP. This can usually reduce this time by using DNS cache results.
    3. Initial connection: http is based on the TCP protocol. The browser can only carry HTTP request message when it shakes hands for the third time as soon as possible, so as to establish a real connection. However, if these connections cannot be reused, each request will experience three handshakes and slow start. The influence of triple handshake is more obvious in high latency scenarios, while slow start has a greater impact on file type large requests.

The development of HTTP is constantly optimizing these directions.

3、http1.1

Http1.0 was first used in web pages in 1996. At that time, it was only used in some simple web pages and network requests, while HTTP1.1 began to be widely used in web requests of major browsers in 1999. At the same time, HTTP1.1 is also the most widely used HTTP protocol. The main differences are as follows:

  • Cache processingIn http1.0, if modified since and expires in the header are mainly used as the criteria for cache judgment. In HTTP1.1, more cache control strategies are introduced, such as entity tag, if unmodified since, if match, if none match and other alternative cache headers.
  • Bandwidth optimization and use of network connectionIn http1.0, there are some phenomena that waste bandwidth. For example, the client only needs a part of an object, while the server sends the whole object, and does not support the breakpoint renewal function. In HTTP1.1, the range header domain is introduced in the request header, which allows only a part of the resource to be requested, that is, the return code is 206 (partial content), which facilitates the developer’s freedom. To make the most of bandwidth and connectivity.
  • Management of error notificationFor example, 409 (conflict) indicates that the requested resource conflicts with the current state of the resource; 410 (Gone) indicates that a resource on the server has been permanently deleted.
  • Host head processingIn HTTP 1.0, it is assumed that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. However, with the development of virtual host technology, there can be multiple virtual hosts (multi homed web servers) on a physical server, and they share an IP address. The request message and response message of HTTP 1.1 should support the host header domain, and an error (400 bad request) will be reported if there is no host header domain in the request message.
  • Long connectionHTTP 1.1 supports persistent connection and pipeline processing of requests. It can transmit multiple HTTP requests and responses on a TCP connection, reducing the consumption and delay of establishing and closing connections. In HTTP 1.1, connection: keep alive is turned on by default, which makes up for the disadvantage that HTTP 1.0 needs to create connections every time it requests.

Although http / 1.1 has optimized many points, as the most widely used protocol version at present, it has been able to meet many network needs, but with the web page becoming more and more complex and even evolving into an independent application, http / 1.1 gradually exposed some problems:

  • When transmitting data, the connection must be reestablished every time, which is particularly unfriendly to the mobile terminal.
  • The transmission content is plaintext, which is not safe enough
  • The content of the header is too large, and the header does not change much each time, causing waste.
  • Keep alive brings performance pressure to the server. In order to solve these problems, HTTPS and spdy came into being.

4、HTTPS

HTTPS is an HTTP channel with security as its goal. In short, it is the Security version of HTTP. That is to say, SSL layer is added under http. The security foundation of HTTPS is SSL. Therefore, SSL is required for the detailed content of encryption.
HTTP that the front end should know

  • HTTPS protocol needs to apply for certificate from ca. generally, there are few free certificates, which need to be paid.
  • HTTP protocol runs on TCP, all transmitted content is plaintext, HTTPS runs on SSL / TLS, SSL / TLS runs on TCP, and all transmitted content is encrypted.
  • HTTP and HTTPS use totally different connection methods and different ports. The former is 80 and the latter is 443.
  • HTTPS can effectively prevent operators from hijacking and solve a big problem of hijacking prevention.

5. Spdy: optimization of http1. X

In 2012, Google brought forward the spdy scheme like a thunderbolt, optimized the request delay of http1. X, and solved the security of http1. X, as follows:

  • Reduce latencyFor the problem of HTTP high latency, spdy adopts multiplexing gracefully. Multiplexing solves the problem of hol blocking by sharing a TCP connection with multiple request streams, reduces latency and improves bandwidth utilization.
  • Request Bundle Priority (request prioritization). A new problem with multiplexing is that it may block critical requests based on connection sharing. Spdy allows priority to be set for each request, so that important requests are responded to first. For example, when the browser loads the homepage, the HTML content of the homepage should be displayed first, and then all kinds of static resource files and script files will be loaded, so as to ensure that the user can see the webpage content in the first time.
  • Header compression。 As mentioned earlier, http1. X headers are often redundant. Choosing a suitable compression algorithm can reduce the size and number of packets.
  • Transmission of encryption protocol based on HTTPSIt greatly improves the reliability of data transmission.
  • Server push(server push), for example, my web page has a request of sytle.css. When the client receives the data of sytle.css, the server will push the file of sytle.js to the client. When the client tries to get sytle.js again, it can get it directly from the cache, and there is no need to send a request.

Spdy composition:

HTTP that the front end should know
Spdy is located under HTTP and above TCP and SSL, so it can easily be compatible with the old version of HTTP protocol (encapsulate the content of http1. X into a new frame format), and use the existing SSL function.

6、HTTP2.0

Http2.0 can be said to be an upgraded version of spdy (in fact, it was originally designed based on spdy). However, there are still differences between http2.0 and spdy, as follows:
Differences between HTTP 2.0 and spdy:

  • HTTP 2.0 supports plaintext HTTP transport, while spdy forces HTTPS
  • The compression algorithm of http2.0 message header adopts hpack instead of flat adopted by spdy.

New features of HTTP / 2

6.1 binary transmission

Http / 2 uses binary format to transmit data, rather than the text format of HTTP 1. X. binary protocol parsing is more efficient. The request and response messages of HTTP / 1 are composed of the starting line, the first part and the entity body (optional), and each part is separated by a text line break character. Http / 2 divides request and response data into smaller frames, and they are binary encoded.

Next, we introduce several important concepts:

  • Flow: a flow is a virtual channel in a connection that can carry two-way messages; each flow has a unique integer identifier (1, 2… N);
  • Message: a logical HTTP message, such as a request, response, etc., consisting of one or more frames.
  • Frame: the minimum unit of HTTP 2.0 communication. Each frame contains a frame header, which at least identifies the current frame’s flow and carries specific types of data, such as HTTP header, load, etc.

HTTP that the front end should know

In http / 2, all communication under the same domain name is completed on a single connection, which can carry any number of two-way data streams. Each data stream is sent in the form of a message, which in turn consists of one or more frames. Multiple frames can be sent in disorder, and can be reassembled according to the flow identification of the frame head.

6.2 multiplexing

Multiplexing technology is introduced in http / 2. Multiplexing can solve the problem of limiting the number of requests under the same domain name, and it is also easier to achieve full speed transmission. After all, a new TCP connection needs to slowly improve the transmission speed.

In http / 2, with binary framing, http / 2 no longer relies on TCP links to achieve multi stream parallelism. In http / 2:

  • All communication under the same domain name is completed on a single connection.
  • A single connection can carry any number of two-way data flows.
  • The data flow is sent in the form of a message, and the message is composed of one or more frames. Multiple frames can be sent in disorder, because they can be reassembled according to the flow identification of the frame header.

This feature greatly improves performance:

  • The same domain name only needs to occupy one TCP connection, and uses one connection to send multiple requests and responses in parallel, eliminating the delay and memory consumption caused by multiple TCP connections.
  • Multiple requests are sent in parallel and interleaved, which do not affect each other.
  • Multiple responses are sent in parallel and interleaved without interference between the responses.
  • In http / 2, each request can have a priority value of 31bit. 0 indicates the highest priority, and the higher the value, the lower the priority. With this priority value, the client and the server can adopt different strategies when processing different flows, and send flows, messages and frames in the optimal way.

HTTP that the front end should know
As shown in the figure above, multiplexing technology can transmit all the requested data through only one TCP connection.

6.3 header compression

In http / 1, we use the form of text to transmit the header. When the header carries a cookie, it may need to transmit hundreds to thousands of bytes repeatedly each time.

In order to reduce resource consumption and improve performance, http / 2 adopts compression strategy for these headers:

  • Http / 2 uses the “first table” on the client and server to track and store the key value pairs sent before. For the same data, it is no longer sent through each request and response;
  • The first table always exists in the lifetime of HTTP / 2 connection, and is updated by the client and the server gradually.
  • Each new first key value pair is either appended to the end of the current table or replaces the previous values in the table

For example, in the following two requests, request one sends all header fields, and the second request only needs to send difference data, which can reduce redundant data and overhead.
HTTP that the front end should know

6.4 server push

Server push means that the server can push the content required by the client in advance, also known as “cache push”.

It can be imagined that some resource clients will definitely request, at this time, the server push technology can be adopted to push necessary resources to the clients in advance, so that the delay time can be relatively reduced. Of course, you can also use prefetch when the browser is compatible.

For example, the server can actively push JS and CSS files to the client, without sending these requests when the client parses HTML.
HTTP that the front end should know

The server can actively push, and the client has the right to choose whether to receive. If the resources pushed by the server have been cached by the browser, the browser can reject them by sending RST ﹣ stream frames. Active push also follows the same origin policy. In other words, the server cannot push the third-party resources to the client at will, but only after the confirmation of both parties.

More articles will be published in the first time in my GitHub, welcome to follow.

Reference resources

  • HTTP protocol details
  • Front end Dictionary: a necessary network foundation for advancement
  • HTTP requests I know
  • Understand http / 2 and HTTP / 3 features in one article