Reptiles are also known asInternet wormSo before we talk about reptiles, we need to know what isnetwork? A network is composed of a number of nodes and links connecting these nodes, and then a large network connected by a series of networks is calledinternetAnd what we’re going to talk about todayHTTPHypertext Transfer Protocol (Hypertext Transfer Protocol) is one of the most widely used network protocols on the Internet. It is formulated and published by the World Wide Web Consortium.
This paper mainly explains the whole process of an HTTP request (DNS resolution does not mention): http origin, TCP / IP protocol, establishing TCP connection, client request, server response, and disconnection of TCP connection. Finally, the article also talks about HTTP related knowledge. The article is longer, suggest collection or forward after reading!
Today we are able to swim in the network, thanks to a computer scientistTim Berners-Lee The concept of. On August 6, 1991, Tim Berners Lee officially launched the world’s first web site on the next computer at CERN（ http://info.cern.ch ）To establish the basic concept and technical system of the Internet, thus opening the prelude of the era of network information.
Berners Lee’s proposal contains the basic concept of the network and gradually establishes all the necessary tools:
- proposeHTTPHypertext Transfer Protocol (Hypertext Transfer Protocol) allows users to access resources by clicking hyperlinks;
- Proposed useHTMLHypertext markup language is the standard for creating web pages;
- A uniform resource locator is createdURLAs a website address system, uniform resource locator is still in use today http://www URL Format;
- Create the firstWeb browser, known as the world wide web browser, is also a web editor;
- Create the firstWeb server( http://info.cern.ch ）And the first web page that describes the project itself.
HTTP protocol has five features in total
- Support client / server mode.
- Simple and fast: when the client requests service from the server, it only needs to transfer the request method and path.
- Flexibility: http allows the transfer of any type of data object. The type being transferred is marked by content type (content type is the identity used to represent the content type in the HTTP package).
- No connection: no connection means that only one request can be processed per connection. After the server processes the client’s request and receives the client’s response, it disconnects. In this way, transmission time can be saved.
- Stateless: stateless means that the protocol has no memory for transaction processing, and the server does not know what state the client is. That is, after we send an HTTP request to the server, the server will send us data according to the request. However, after sending, no information will be recorded (cookie and session are bred and will be discussed later).
2、 TCP / IP protocol
We often hear a saying that:HTTP is a TCP / IP protocol cluster to deliver data。
How to understand the above sentence? Let’s seeTCP / IP four layer modelI see.
From the above figure, we can clearly see that the transport layer protocol used by HTTP isTCP protocolAnd the network layer usesIP protocol(a lot of other protocols are used, of course)HTTP is a TCP / IP protocol cluster to deliver data。
Again, we can seepingThe ICMP Protocol is the reason why sometimes we can use VPS to access the Internet, but Ping Google doesn’t work because we use different protocols.
How does the TCP / IP protocol cluster work? Let’s take a look at the following figure:
We can see that in the data sending end, the data is encapsulated layer by layer, the data receiving end is unpacked layer by layer, and finally the application layer obtains the data.
3、 Establish TCP connection
Now that we know how the TCP / IP protocol cluster works, let’s take a look at how HTTP establishes connections.
1. TCP packet header information
We talked about it earlierHTTP is a TCP / IP protocol cluster to deliver dataSo this HTTP connection is to establish a TCP connection. How does TCP establish a connection? Let’s take a look at the TCP packet information structure.
TCP packet = TCP header + TCP data bodyThe TCP header contains six control bits (in the red box above), which represent the status of the TCP connection
- Urg: urgent data — this is an urgent message
- ACK: acknowledge receipt
- PSH: prompt receiver application to read data from TCP accept buffer immediately
- Rst: requests the other party to re-establish the connection
- SYN: request to establish a connection
- Fin: it means to inform the other party that the local end is about to close the connection
2. Establish connection process
After knowing the TCP packet header information, we can formally look at the three handshakes that TCP establishes a connection.
Three handshakes explain:
- The client sends a packet with the bit code of syn = 1 and randomly generates a packet with SEQ number = 1234567 to the server. The server knows that the client needs to set up online by syn = 1 (client: I want to connect you)
- After receiving the request, the server should confirm the online information, send ack number = (SEQ + 1 of the client), syn = 1, ACK = 1, and randomly generate a packet with SEQ = 7654321 (server: OK, you can connect it)
- After receiving, the client will check whether the ACK number is correct, that is, the SEQ number + 1 sent for the first time, and whether the bit code ack is 1. If it is correct, the client will send ack number = (SEQ + 1 of the server), ACK = 1, and the server will confirm the SEQ value and ACK = 1 after receiving it, then the connection is established successfully. (client: OK, here I am)
Interviewer: why does HTTP need three handshakes, not two or four
A: three times is the least safe time. Two times is unsafe and four times is a waste of resources
4、 Client request
After the client is connected to the server, the client can start to request resources from the server, and then it can start sending HTTP requests.
1. HTTP request message structure
We said that beforeTCP packet = TCP header + TCP data bodyWe have already talked about the TCP header information. Now we talk about the TCP data body, which is ourHTTP request message。
2. HTTP request instance
Take a look at an actual HTTP request example:
- ① Http / 1.1 defines eight request methods: get, post, put, delete, patch, head, options, and trace. The two most common get and post methods are get, post, delete and put if they are restful interfaces
- ② For the corresponding URL address of the request, it and the host attribute of the header constitute the complete request URL
- ③ Is the agreement name and version number
- ④ It is the HTTP header. The header contains several attributes in the format of “attribute name: attribute value”, and the server obtains the client’s information accordingly
- ⑤ It encodes the component values in a page form into a formatted string in the form of key value pairs Param1 = value1 & param2 = Value2, which carries the data of multiple request parameters. Not only can the message style pass the request parameters, but also the request URL can be passed through the URL similar to “Chapter 15″/ user.html? Param1 = value1 & param2 = Value2 “.
There are many parameters in the request header, so I don’t explain them one by one, but only two low-level anti pickling parameters
- User agent: the name and version of the operating system and browser used by the client. Some websites will restrict the request browser
- Referer: the address of the previous page, indicating where the request comes from. Some websites restrict the source of the request
5、 Server response
After receiving the client request, the server needs to respond and return it to the client, and the HTTP response message structure is consistent with the request structure.
1. HTTP response message structure
2. HTTP response instance
3. Response status code
In the response message, we focus on the following: the response status code of the server, which is easy to ask in the interview. The following is only a list of categories, and the detailed status code can be found on the Internet.
After the server responds, a session ends. Will the connection be disconnected at this time?
1. Long and short connection
Whether to disconnect, we need to distinguish the HTTP version:
- In the HTTP / 1.0 version, after a request / response is completed between the client and the server, theTCP connection disconnectedThe next time a request is made, the TCP connection will be reestablished. This is also known asShort connection
- Only half a year after the release of http1.0 (January 1997), the release of HTTP / 1.1 brings a new function: after a request / response between the client and the server, the TCP connection is allowed to be opened continuously, which means that the next request directly uses the TCP connection instead of a new handshake to establish a new connectionLong connection
Note: long connection means that a TCP connection allows multiple HTTP sessions. HTTP is always a request / response, and the session ends. HTTP itself does not have a long connection.
As early as 1999, HTTP1.1 was popularized, so browsers now carry a parameter in the request header when they request:Connection:keep-aliveThis means that the browser requires a long connection with the server, and the server can also set whether it is willing to establish a long connection.
2. Advantages and disadvantages of long connection
For the server, establishing a long connection has advantages and disadvantages
- Advantages: when there are a large number of static resources (pictures, CSS, JS, etc.) in the website, you can open a long connection, and these pictures can be sent through a TCP connection.
- Disadvantages: when the client requests once, it is not requesting, but the server is open, and the resources are occupied. This is a serious waste of resources.
Therefore, whether to open a long connection or not, the long connection time needs to be set reasonably according to the website itself.
PS: don’t underestimate this TCP connection. In a client HTTP complete request (DNS addressing, establishing TCP connection, request, waiting, parsing web page, and disconnecting TCP connection), it takes a lot of time to establish a TCP connection.
3. Disconnection process
It’s three handshakes to establish a TCP connection, and four waves to disconnect a TCP connection!
When talking about the TCP / IP protocol, we said that the flag bit:Fin indicates that the local end will close the connection，So why do you need to wave four times to disconnect?Here for everyone’s homework, you can give your understanding in the message, see if it is correct.
1. Interview required questions: http three handshakes, four wave
Interviewer: why does it take three handshakes to establish a connection and four waves to close it. After class homework for everyone, give your opinion in the message!
Http / 1.1 has been serving us for 20 years, but http / 2.0 was actually released in 2015, but it has not been popularized. For the new features of HTTP / 2.0, you can also check relevant information online
Because of HTTPSlow response、Large request header sizeSo in the era of microservice, we all use RPC to call services. We are interested in RPC related concepts, and students can learn online by themselves.
There are two other big drawbacks to httpPlaintextAndIntegrity cannot be guaranteed, so it will be gradually replaced by HTTPS, and the knowledge of HTTPS will be explained to you in the next issue.
More Python learning courses, can be concerned about pig brother WeChat official account.Pigs sleeping naked“Get it!