Deep understanding of HTTP protocol

*Foreword: I'd like to share with you an article about my experience since I developed it,
Let's share a personal understanding of HTTP*

What are the methods of HTTP?

  • HTTP 1.0 defines three request methods: get, post and head
  • HTTP1.1 adds five request methods: options, put, delete, trace and connect

What are the specific functions of these methods?

  • Get: usually used to request the server to send some resources
  • Head: the header information of the request resource, and these headers are consistent with those returned when the HTTP get method is requested
  • Options: used to get the communication options supported by the destination resource
  • Post: send data to server
  • Put: a representation used to add new resources or replace the target resources with the payload in the request
  • Delete: used to delete the specified resource
  • Patch: used for partial modification of resources
  • Connect: http / 1.1 protocol is reserved for proxy server that can change connection to pipeline mode
  • Trace: echo requests received by the server, mainly for testing or diagnostics

What’s the difference between get and post?

  • The data transmission mode is different: get request transmits data through URL, while post data transmits through request body.
  • Different security: the post data is in the request body, so there is a certain degree of security guarantee, while the get data is in the URL. Through the historical records, the cache can easily find the data information.
  • The data types are different: get only allows ASCII characters, post is unlimited
  • Get harmless: refresh, back and other browser operations get request is harmless, post may repeatedly submit the form
  • Different features: get is secure (the security here refers to read-only feature, which means that the server state will not be changed by using this method) and idempotent (the concept of idempotent means that the effect of executing the same request method multiple times and only once) is exactly the same, while post is non secure and non idempotent

Put and post both send new resources to the server. What is the difference?

The difference between put method and post method is that put method is idempotent: one or more consecutive calls have the same effect (no side effect), while post method is non idempotent.

In addition, there is another difference. In general, the URI of put points to a specific single resource, while post can point to a collection of resources.

For example, we are developing a blog system, which is often used when we want to create an articlePOST semantics of this request is to create a new article under the resources collection of articles. If we submit this request multiple times, multiple articles will be created, which is non idempotent.

andPUT semantic of “Cai Xukun” is to update the resources under the corresponding article (such as modifying the author’s name). This URI points to a single resource and is idempotent. For example, you change “Liu Dehua” to “Cai Xukun”, and how many times you submit it, you change it to “Cai Xukun”

PS: "post means creating resources, put means updating resources." this statement is wrong. Both of them can create resources. The fundamental difference lies in idempotency

Put and patch send modification resources to the server. What is the difference?

Put and patch are both update resources, while patch is used to locally update known resources.

For example, we have the address of an article article can be expressed as follows:

Author: "maojiale", 
creationDate: '2020-2-22', 
Content: 'I write an article about Mao Jiale', 
id: 820357430 

This modification method of directly covering resources should be put, but you think that each time there is so much useless information, you can send itPATCH this time, you only need to:

Author: 'Mao Jiale',

What is the HTTP request message like?

The request message consists of four parts

  • Request line
  • Request header
  • Empty line
  • Request body

Deep understanding of HTTP protocol

  • The request line includes: request method field, URL field and HTTP protocol version field. They are separated by spaces. For example, get/ index.html HTTP/1.1。
  • Request header: the request header is composed of key / value pairs, one pair for each line, and the key and value are separated by English colon “:”
  1. User agent: the type of browser that generated the request.
  2. Accept: a list of content types recognized by the client.
  3. Host: the requested host name, which allows multiple domain names to be in the same IP address, that is, the virtual host.
  • Request body: data carried by request such as post put

Deep understanding of HTTP protocol

What is the response message of HTTP?

The request message consists of four parts

  • Response line
  • Response head
  • Empty line
  • Response body

Deep understanding of HTTP protocol

  • Response line: consists of protocol version, status code and Reason Phrase of status code, such asHTTP/1.1 200 OK
  • Response head: composed of response radicals
  • Response body: the data that the server responds to

What are the radicals of HTTP?

There are a lot of contents, and the key points are as follows A kind of "Content"

General header fields: the header used by both request and response messages

  • Cache control controls caching A kind of
  • Connection management, item by item A kind of
  • Upgrade to other protocols
  • Information about via proxy server
  • Wraning error and warning notification
  • Transmission coding format of transfor encoding message body A kind of
  • Header list of trailer message end
  • Pragma message instruction
  • Date the date the message was created

Request header fields: the header used by the client when sending the requested message to the server

  • The type of media that the accept client or agent can handle A kind of
  • Accept encoding takes precedence over processable encoding formats
  • Accept language takes precedence over natural languages that can be processed
  • Accept charset takes precedence over character sets that can be processed
  • If match compare entity tag (Etage) A kind of
  • If none match compares the Etage with if match A kind of
  • If modified since compare resource update time (last modified) A kind of
  • If unmodified since compares resource update time (last modified) with if modified since A kind of
  • Send range request of entity byte when if rnages resource is not updated
  • Byte range request for range entity A kind of
  • Authentication information of authorization Web A kind of
  • Proxy authorization proxy server requires web authentication information
  • Host requests the server where the resource is located A kind of
  • From user’s email address
  • User agent client program information A kind of
  • Max forwards maximum hop by hop
  • Priority of te transmission coding
  • Referer requests the original URL
  • Expect expects specific behavior from the server

Response header fields: the field used when responding from the server to the client

  • Accept ranges acceptable byte ranges
  • Age calculates resource creation time
  • Location the URI that the client is redirecting A kind of
  • Cache information of vary proxy server
  • Etag is a string that represents the unique resource of a resource A kind of
  • Www authenticate server requires client authentication information
  • The proxy authenticate proxy server requires authentication information from the client
  • Server server information A kind of
  • The first field used in conjunction with status code 503 indicates the time of the next request to the server

Entity header fields: the header is used for the entity part of request message and response message

  • Allow resources can support HTTP requests A kind of
  • Resource language of content language entity
  • Encoding format of content encoding entity
  • Size of content length entity (bytes)
  • Content type entity media type
  • Abstract of content-md5 entity message
  • Content location replaces yRi of resources
  • Content rnages entity body position return
  • Last modified resource the last modified resource A kind of
  • Expired resources for expires entity principal A kind of

What are the HTTP status codes?

2XX success

  • 200 OK, indicating that the request from the client is correctly processed on the server side A kind of
  • 201 created request has been implemented, and a new resource has been created according to the requirements of the request
  • 202 accepted request has been accepted, but it has not been executed. It is not guaranteed to complete the request
  • 204 no content, indicating that the request is successful, but the response message does not contain the body of the entity
  • 206 partial content for range request A kind of

3xx redirection

  • 301 moved permanently, indicating that the resource has been assigned a new URL
  • 302 found, temporary redirection, indicates that the resource is temporarily assigned a new URL A kind of
  • 303 see other, indicating that there is another URL for the resource, and the get method should be used to obtain the resource
  • 304 not modified means that the server is allowed to access resources, but the condition is not met due to the request
  • 307 temporary redirect, temporary redirection, has the same meaning as 302

4xx client error

  • 400 bad request, the request message has syntax error A kind of
  • 401 unauthorized, indicating that the request sent needs to have authentication information authenticated by HTTP A kind of
  • 403 Forbidden, indicating that access to the requested resource was denied by the server A kind of
  • 404 not found, indicating that the requested resource was not found on the server A kind of
  • 408 request timeout, client request timeout
  • 409 confirm, the requested resource may cause conflicts

5xx server error

  • 500 Internal sever error, which indicates that an error occurred on the server side while executing the request A kind of
  • 501 not implemented request is out of the server’s capability. For example, the server does not support a function required by the current request, or the request is a method that the server does not support
  • 503 service unavailable indicates that the server is temporarily overloaded or is down for maintenance and cannot process the request
  • 505 HTTP version not supported server does not support or refuses to support HTTP version used in requests

What’s the difference between redirecting 307303302?

302 is the protocol status code of http1.0. In order to refine 302 status code, two 303 and 307 were produced in HTTP1.1.

303 clearly indicates that the client should use the get method to obtain resources, and he will change the post request into get request for redirection. 307 follows browser standards and does not change from post to get.

What does HTTP keep alive do?

In the early http / 1.0, a connection was created every time an HTTP request was made, and the process of creating a connection required resources and time. In order to reduce resource consumption and shorten response time, it was necessary to reuse the connection. In the later http / 1.0 and HTTP / 1.1, the mechanism of reusing connection was introduced, that is to add a connection: keep alive in the HTTP request header to tell the other party not to close the request after the response is completed, and we will continue to communicate with this request next time. According to the protocol, if http / 1.0 wants to keep a long connection, it needs to add connection: keep alive in the request header.

The advantages of keep alive are as follows:

  • Less CPU and memory usage (due to fewer simultaneous open connections)
  • Allow HTTP pipelining of requests and replies
  • Reduce congestion control (reduced TCP connections)
  • Reduces latency for subsequent requests (no handshake required)
  • There is no need to close the TCP connection to report errors

Why HTTP and why HTTPS?

HTTPS is a secure version of HTTP. Because the data of HTTP protocol is transmitted in plaintext, it is not secure for the transmission of some sensitive information. The purpose of HTTPS is to solve the insecurity of HTTP.

How is HTTPS secure?

The process is complicated. We have to understand two concepts first

Symmetric encryption: that is, both sides of the communication use the same secret key for encryption and decryption. For example, the secret signal of the spy joint belongs to symmetric encryption

Although symmetric encryption is simple and has good performance, it can not solve the problem of sending the secret key to the other party for the first time. It is easy to be intercepted by hacker.

Asymmetric encryption:

  1. Private key + public key = key pair
  2. That is, only the corresponding public key can be decrypted for the data encrypted with the private key, and only the corresponding private key can be decrypted for the data encrypted with the public key
  3. Because both sides of the communication have a set of their own key pairs, before communication, both sides will first send their public keys to each other
  4. Then the other party will take the public key to encrypt the data and respond to the other party. After arriving at the other party, the other party will decrypt it with its own private key

Although the security of asymmetric encryption is higher, the problem is that the speed is very slow, which affects the performance.


Then, combining the two encryption methods, the symmetric encryption key is encrypted with the asymmetric encryption public key, and then sent out. The receiver uses the private key to decrypt to get the symmetric encryption key, and then both parties can use symmetric encryption to communicate.

At this time, there is another problem: the middleman problem

If there is a middleman between the client and the server, the middleman only needs to change the public key of the original communication between the two sides into his own public key, so that the middleman can easily decrypt all the data sent by both sides of the communication.

So at this time, a secure third-party Certificate (CA) is needed to prove the identity and prevent being attacked by man in the middle.

The certificate includes: signer, certificate purpose, user public key, user private key, user hash algorithm, certificate expiration time, etc

Deep understanding of HTTP protocol

But the problem is, if the middleman tampers with the certificate, is the identification invalid? This proof is bought for nothing. At this time, we need a new technology, digital signature.

Digital signature is to hash the content of the certificate by hash algorithm, and then encrypt it with CA’s private key to form a digital signature.

When someone sends his certificate, I use the same hash algorithm to generate the message digest again, and then decrypt the digital signature with CA’s public key to get the message digest created by ca. by comparing the two, we can know whether someone has tampered with it.

This is the time to maximize the security of communication.

What are the advantages and characteristics of http2 compared with http1. X?

Binary framing

Frame: the smallest unit of HTTP / 2 data communication message: refers to the logical HTTP message in http / 2. For example, requests and responses, messages consist of one or more frames.

Stream: a virtual channel that exists in the connection. Flows can carry two-way messages, each with a unique integer ID

Http / 2 uses binary format to transmit data, rather than HTTP 1. X text format, binary protocol parsing is more efficient.

Server push

The server can actively push other resources when sending HTML pages, instead of waiting for the browser to resolve to the corresponding location and initiate a request to respond. For example, the server can actively push JS and CSS files to the client without sending these requests when the client parses HTML.

The server can actively push and the client has the right to choose whether to receive or not. If the resources pushed by the server have been cached by the browser, the browser can send rst_ Stream frame to reject. Active push also follows the same origin policy, and the server will not push third-party resources to clients.

Head compression

Http / 1. X will repeatedly carry unchangeable and lengthy header data in requests and responses, which brings extra burden to the network.

  • Http / 2 uses the “header table” on the client and server to track and store previously sent key value pairs. For the same data, it is no longer sent through each request and response
  • The first table always exists in the duration of HTTP / 2 connection, and is updated gradually by the client and server;
  • Each new first key value pair is either appended to the end of the current table or replaces the previous value in the table.

    You can think of it as sending only differential data, not all of it, thus reducing the amount of information in the header

Deep understanding of HTTP protocol


In HTTP 1. X, if you want to concurrent multiple requests, you must use multiple TCP links. In order to control resources, the browser also has a limit of 6-8 TCP link requests for a single domain name.

In http2:

  • All communications under the same domain name are completed on a single connection.
  • A single connection can carry any number of bidirectional data streams.
  • The data stream is sent in the form of message, and the message is composed of one or more frames. The multiple frames can be sent out in disorder because they can be reassembled according to the flow identification in the head of the frame

Deep understanding of HTTP protocol