Original is not easy, if you need to reprint, please contact the author or sign the author and indicate the source of the article
When we talk about HTTP protocol, we usually think of the handshake process of TCP IP + SSL / TLS. Today, I’d like to sort out some knowledge points related to HTTP from another perspective.
There are seven aspects in this paper
1. The evolution history of HTTP will give a brief introduction to the update content, problems encountered and improvement solutions of each version
2. HTTP caching policy
3. Cross domain strategy
4. Concurrency of HTTP
5. Differences between get / post and common port numbers
6. Some open questions
7. Finally, this paper lists the shoulders of giants in the process of writing.
1、 The evolution of HTTP
- Only the client is allowed to send the get request, and the request header is not supported
- Only one kind of content is supported, that is, plain text. Html is supported, but pictures cannot be inserted
- Request and response support header domain
- The response object starts with a response status line
- Response objects are not limited to hypertext
- The client is supported to submit data to web server through post method, and get, head and post methods are supported
- Support long connection (but short connection by default), caching mechanism, and authentication
- The default is long connection
HTTP 1.1 supports persistent connection and pipelining processing of requests. Multiple HTTP requests and responses can be transmitted on a single TCP connection, which reduces the consumption and delay of establishing and closing connections. Connection: keep alive is turned on by default in HTTP 1.1, which makes up for the disadvantage that HTTP 1.0 needs to create a connection every time it requests.
- Provides range request function (broadband optimization)
In http1.0, there are some phenomena of wasting bandwidth. For example, the client only needs a part of an object, but the server sends the whole object, and does not support the function of breakpoint continuation. HTTP1.1 introduces a range header field in the request header, which allows only a part of the resource to be requested, that is, the return code is 206 (partial) In this way, developers can choose freely to make full use of bandwidth and connection. This is the basis of supporting file breakpoint continuation.
- Provides the function of virtual host (host domain)
In HTTP 1.0, it is considered that each server is bound with a unique IP address, so the URL in the request message does not pass the host name. However, with the development of virtual host technology, there can be multiple virtual hosts (multi homed web servers) on a physical server, and they share an IP address. Both request message and response message of HTTP 1.1 should support host header domain, and if there is no host header domain in the request message, an error (400 bad request) will be reported.
- More cache processing fields
Http / 1.1 adds some new cache features on the basis of 1.0, introduces entity tags, commonly known as e-tags, and adds a more powerful cache control header.
- Management of error notification
In HTTP1.1, 24 error status response codes are added. For example, 409 (conflict) indicates that the requested resource conflicts with the current state of the resource; 410 (Gone) indicates that a resource on the server is permanently deleted.
- High latency – head of line blocking
- Stateless feature – blocking interaction
The protocol has no memory ability for the connection state, and the server does not know how it relates to the previous request, in other words, the login state is dropped
- Plaintext transmission – insecurity
The transmission content is not encrypted, and may be tampered and hijacked in the middle.
- For team head blocking:
1. Distribute the resources of the same page to different domain names to improve the connection limit. Although a common TCP pipe can be used, only one request can be processed at the same time in a pipe. Before the end of the current request, other requests can only be blocked.
2. Reduce the number of requests
3. Inline some resources: CSS, Base64 pictures, etc
4. Merge small files to reduce resources
- For insecurity:
2. Token signature verification
3. Customization / agreement: data encryption scheme
- Binary frame transmission:
1) All frames in HTTP 2.0 are binary coded 2) A frame is the smallest data unit, and each frame will identify which stream the frame belongs to, which is a data stream composed of multiple frames. Multiplexing means that there can be multiple streams in a TCP connection 3) Frame: the client communicates with the server by exchanging frames. Frame is the smallest unit of communication based on this new protocol. 4) Message: refers to the logical HTTP message, such as request, response, etc., which is composed of one or more frames. 5) Stream: a stream is a virtual channel in a connection that can carry two-way messages; each stream has a unique integer identifier (1, 2,...) N）；
- Multiplexing to solve queue head blocking
Multiplexing allows multiple request response messages to be initiated simultaneously over a single http / 2.0 connection. With the new framing mechanism, http / 2.0 no longer relies on multiple TCP connections to handle more concurrent requests. Each data stream is divided into many independent frames, which can be interleaved (out of order) and prioritized. Finally, they are recombined at the other end according to the stream identifier at the beginning of each frame. HTTP 2.0 connections are persistent, and only one connection (one connection for each domain name) is needed between the client and the server.
- Header compression to solve the huge HTTP header
The header of HTTP / 1.1 contains a lot of information, and it has to be sent repeatedly every time. Http / 2.0 requires both sides of the communication to cache a header field table, so as to avoid repeated transmission.
- Request priority – get important data first
The browser can dispatch requests immediately when resources are found, specify the priority of each flow, and let the server decide the optimal response order. In this way, requests do not have to be queued, which saves time and maximizes the utilization of each connection.
- Server push to fill the vacancy and improve request efficiency
Request index.html You can return JS and CSS that you first rely on directly by configuring them on nginx
- Improve security
Based on HTTPS
- TCP and TCP + TLS connection delay: TCP connection requires three handshakes with the server, that is, 1.5 RTTS (round trip time) are consumed
TCP’s queue head blocking has not been completely solved
- In order to ensure reliable transmission, TCP has a “timeout retransmission” mechanism, and the lost packets must wait for retransmission confirmation.
- When packet loss occurs in http2, the entire TCP will wait for retransmission, and then all requests in the TCP connection will be blocked.
- Multiplexing increases server pressure
- Multiplexing is easy to time out
5. HTTP 3 [Google’s quic protocol based on UDP protocol]
- Improved congestion control and reliable transmission
- fast handshake
- TLS 1.3 encryption is integrated
- Connection migration
In some NAT network environments (such as some campus networks), UDP protocol will be prohibited by routers and other intermediate network devices. At this time, the client will directly degrade and choose the alternative channel such as HTTPS to ensure normal business requests.
1. Cache type
- 200 form memory cache: do not access the server. Generally, the resource has been loaded and cached in the memory. Read the cache directly from the memory. After the browser is closed, the data will not exist (resources are released). When the same page is opened again, from memory cache will not appear.
- 200 from disk cache: do not visit the server. The resource has been loaded at a certain time before. Read the cache directly from the hard disk. After closing the browser, the data still exists. This resource will not be released with the closing of the page. The next time it is opened, it will still be from disk cache.
- Priority is given to memory cache, followed by disk cache, and finally to request network resources
2. Strong / negotiation caching
1) Strong cache expires / cache control: Max age = 600
- Expires: expiration time. If the time is set, the browser will read the cache directly within the set time and will not request any more
Cache control: when the value is set to Max age = 300, it means that if the resource is loaded again within 5 minutes of the correct return time of the request (which will also be recorded by the browser), the strong cache will be hit.
- Private indicates that the response can only be cached by a single user, not as a shared cache (that is, the proxy server can’t cache it). Public indicates that the response can be cached by any object (including the client sending the request, the proxy server, etc.). No cache can be omitted. Its validity must be confirmed before caching. No store caches any content of the request or response. Max age = [S] Maximum value to be responded to
- (1) Max age: used to set how long resources (representations) can be cached, in seconds;
- (2) S-maxage: the same as Max age, but only for proxy caching;
- (3) Public: can be omitted, indicating that the response can be cached by any cache; the response can be cached by any object (including the client sending the request, proxy server, etc.)
- (4) Private: no parameter, only for individual users, not cached by proxy server;
- (5) No cache: it can be omitted to force the client to send requests directly to the server, that is to say, every request must be sent to the server. The server receives the request, and then judges whether the resource has changed. If yes, it returns the new content. Otherwise, it returns 304, which has not changed. This is very easy to misunderstand, which makes people mistakenly think that the response is not cached. In fact, cache control: no cache will be cached, but every time the response data is provided to the client (browser), the cache must evaluate the effectiveness of the cache response to the server.
- (6) No store: no parameter, no caching is allowed.
2) Negotiation cache
last-modify + if-modify-since http1.0
- Last modified: the last modified time when the browser sends the resource to the server
- When the resource is out of date (the browser judges that the max age of the cache control ID is out of date), it is found that the response header has the last modified declaration. If the response header is sent to the server again, it will bring the header if modified since, indicating the request time. If the server finds an if modified since after receiving the request, it compares it with the last modified time of the requested resource. If the last modified time is relatively new (large), it indicates that the resource has been modified again, it returns the latest resource, HTTP 200 OK; if the last modified time is relatively old (small), it indicates that the resource has no new modification, and it responds to HTTP 304 cache.
ETag + if-not-match HTTP 1.1
- Etag belongs to HTTP 1.1 attribute. It is generated by the server (APACHE or other tools) and returned to the front end to help the server control the cache verification of the web end. In Apache, the value of Etag is obtained by hashing the inode, size and mtime of the file by default.
- If not match when the resource is out of date, the browser will find Etag in the response header. If the server requests again, it will bring the request header if none match (the value is the value of Etag). The server receives the request for comparison and decides to return 200 or 304
3) Pragma: no cache / catch control: no store
4) Cache priority
- In order, first judge the [pragma: no cache] of http1.0, then cache control, then expires, then Etag, and finally last modified. If all are satisfied, it is 304, and if one item is not satisfied, it is 200. 】
- Expires is more to support the response header of HTTP / 1.0 ancient browsers. It is a specific point in time. The time of the client and the server may be inconsistent, or the network delay may lead to inaccurate time
- Cache control: Max age is a number of seconds. If both appear at the same time, Max age shall prevail
- Etag is rarely used in general distributed environment (such as CDN), because Etag depends on the hash algorithm of web server. Different web servers, different versions and different configurations will result in the same file Etag may be unequal. Of course, if you can limit all the above information, you can also use Etag, not absolutely.
- The last modified time precision is a matter of seconds. If it is modified within one second, the last modified will not be changed. Etag uses the summary algorithm and can be refreshed in time
3. Caching caused by different behaviors
Enter in the URI field and press enter / access by bookmark
- The browser finds that the resource has been cached and has not expired (through the expires header or cache control header). Instead of confirming with the server, the browser directly uses the content cached by the browser. The response content is as like as two peas before, for example, the Date time is the last time. So we can also see that the size of the resource is from cache
F5 / click refresh button in the toolbar / right click menu to reload
- F5 will let the browser send an HTTP request to the server in any case, even if the previous response has an expires header
- What Ctrl + F5 wants is to completely get a new resource from the server, so it’s not just about sending http The request is sent to the server, and there is no if modified since / if none match in the request. This forces the server not to return 304, but to return a copy of the whole resource. In this way, the transmission time caused by Ctrl + F5 becomes longer, and the refresh of the natural Web page is slower. We can see that the operation returns 200 and refreshes the relevant cache control time.
- To ensure that you get the latest one from the server, Ctrl + F5 not only removes if modified since / if none match, but also adds some HTTP headers. According to http / 1.1 protocol, cache not only exists in browser terminal, but also in intermediate nodes (such as proxy) between browser and server. In order to prevent getting only the cache of these intermediate nodes, we need to tell them not to use their own cache to perfunctory me, and ask for the latest copy from the upstream node.
- Chrome 51 will contain two header information, which is used to make the middle cache invalid for this request. In this way, the returned resource is absolutely fresh. Cache-Control: no-cache Pragma: no-cache
3、 Cross domain problems
Protocol | site domain name | port number has a different is cross domain
The cross domain resources requested by the server cannot be shared You cannot share data from localstorage as long as it is not of the same origin
1) Jsonp, which only supports get requests, has the advantage of supporting old-fashioned browsers and can request data from websites that do not support CORS.
2) Cross, CORS supports all types of HTTP requests
3) Iframe nesting
4) PostMessage: the PostMessage (data, origin) method allows scripts from different sources to communicate asynchronously, and can realize cross text file, multi window and cross domain message delivery
*Safari: the parent page can't transfer information to the cross domain page in iframe. The cross domain storage function can be realized by the method of URL value transfer. The page URL parameter can be used (Safari browser can support the length of more than 64K characters)
5) New cross domain strategy:Cross domain isolation of coop and COEP
4、 Concurrency problem
1. Does a modern browser disconnect after an HTTP request is completed after establishing a TCP connection with the server? When will it be disconnected?
In http / 1.0, a server will break the TCP link after sending an HTTP response
- Although it is not set in the standard, some servers support the connection: keep alive header.
- Http / 1.1 writes the connection header into the standard, and the persistent connection is enabled by default. Unless the connection: close is specified in the request, the TCP connection between the browser and the server will be maintained for a period of time
2. A TCP connection can send multiple HTTP requests.
3. Can HTTP requests be sent together in a TCP connection (for example, three requests are sent together and three responses are received together)?
- There is a problem in http / 1.1. A single TCP connection can only process one request at the same time. The time from the beginning to the end of any two HTTP requests cannot overlap in the same TCP connection.
Pipelining is specified in http / 1.1
- A client supporting persistent connection can send multiple requests in one connection (without waiting for any response). The server that receives the request must send the response in the order in which the request is received.
- But this function is turned off by default in the browser. Since http / 1.1 is a text protocol, and the returned content can’t distinguish which sent request, the order must be consistent
- Modern browsers do not turn on HTTP pipelining by default
- Some proxy servers can’t handle HTTP pipelining correctly.
- The correct pipeline implementation is complex.
- Head of line blocking connector blocked
- Maintain the established TCP connection with the server, and process multiple requests on the same connection in sequence.
- Establish multiple TCP connections with the server.
Http2 provides multiplexing
- Multiple HTTP requests can be completed simultaneously in one TCP connection
4. Why do I sometimes refresh the page without reestablishing the SSL connection?
- TCP connection is sometimes maintained by browser and server for a period of time. TCP does not need to be re established, and SSL will naturally use the previous one.
5. Does the browser limit the number of TCP connections to the same host?
- Chrome allows up to six TCP connections to the same host
If the pictures are all connected by HTTPS and under the same domain name, the browser will discuss with the server whether to use http2 after the SSL handshake
- If you can, use the multiplexing function to multiplex on this connection
You can’t use http2 or HTTPS
- The browser will establish multiple TCP connections on a host. The maximum number of connections depends on the browser settings
- These connections are used by browsers to send new requests when they are idle
- What if all the connections are sending requests? Other requests will have to wait.
5、 Request related
1. Get post difference
*The meaning of the get method is to request to obtain resources from the server, which can be static text, pages, pictures, videos, etc. *Post submits data to the resource specified in URI, and the data is put in the body of the message. *The get method is secure and idempotent, not post *The concepts of security and idempotent are as follows *In the HTTP protocol, the so-called "security" means that the request method will not "destroy" the resources on the server. *The so-called "idempotent" means that if you perform the same operation many times, the result will be the same.
2. Common port number
- “204 no content” is also a common success status code, which is basically the same as 200 OK, but there is no body data in the response header.
- “206 partial content” is applied to HTTP block download or breakpoint continuation. It indicates that the body data returned by the response is not all of the resources, but part of them. It is also the status of successful processing by the server.
- 301 “moved permanently” indicates that the requested resource no longer exists and needs to be accessed again with a new URL.
- 302 found “indicates temporary redirection, indicating that the requested resource is still available, but it needs to be accessed by another URL temporarily.
- 301 and 302 will use the field “location” in the response header to indicate the subsequent URL to jump to, and the browser will automatically redirect the new URL.
- 304 “not modified” does not have the meaning of jump. It means that the resource has not been modified and the existing buffer file is redirected. It is also called cache redirection and is used for cache control.
- “403 Forbidden” indicates that the server is not allowed to access resources. It is not an error in the client’s request.
- “500 internal server error” and “400” are general error codes. We don’t know what happened to the server.
- “501 not implemented” means that the function requested by the client is not yet supported, similar to the meaning of “opening soon, please look forward to it”.
- “502 bad gateway” is usually the error code returned by the server as a gateway or proxy, indicating that the server itself is working normally and an error has occurred when accessing the back-end server.
- “503 service unavailable” means that the server is currently busy and cannot respond to the server for the time being, similar to “network service is busy, please try again later”.
6、 Boundary problem
*After a and B machines connect normally, B machine suddenly restarts and asks a what state is TCP at this time * [https://github.com/Advanced-Frontend/Daily-Interview-Question/issues/21](https://github.com/Advanced-Frontend/Daily-Interview-Question/issues/21) *How many HTTP requests can a TCP connection send * [https://maimai.cn/article/detail?fid=1565594485&efid=2XGge6_3eNs_d2tiQsBWRw&use_rn=1](https://maimai.cn/article/detail?fid=1565594485&efid=2XGge6_3eNs_d2tiQsBWRw&use_rn=1)
7、 Reference link
*[manual refining] hard core! 30 diagrams [common interview questions]（ https://www.cnblogs.com/xiaolincoding/p/12442435.html ) *[development and evolution of HTTP]（ https://www.cnblogs.com/xiaolincoding/p/12442435.html ) *[how many HTTP requests can a TCP connection send]（ https://maimai.cn/article/detail?fid=1565594485&efid=2XGge6_ 3eNs_ d2tiQsBWRw&use_ rn=1) *[browser HTTP caching mechanism]（ https://juejin.cn/post/6844903554587574285 ) *[forced cache and negotiated cache]（ https://juejin.cn/post/6844903838768431118 ) *[cache priority]（ https://segmentfault.com/q/1010000022541364 ) *[HTTP differences]（ https://zhuanlan.zhihu.com/p/102561034 ) *[HTTP 0.9 HTTP 1.0 HTTP 1.1 HTTP 2.0 difference]（ https://www.cnblogs.com/wupeixuan/p/8642100.html ) *[HTTP cache control summary]（ https://imweb.io/topic/5795dcb6fb312541492eda8c )
Original is not easy, if you need to reprint, please contact the author or sign the author and indicate the source of the article