HTTP is the most popular understanding, don’t worry about not remembering

Time:2020-9-26

Introduction

A screen long text, more in-depth understanding of the HTTP protocol. For the students who are not long after the introduction of the front end, they may learn the front end from HTML, CSS and JS, and then start a framework. But for the understanding of HTTP, you may only know some questions about HTTP in the interview or less understand some theoretical knowledge at the code level. After reading this article, I hope you can have a more in-depth understanding of HTTP and be able to develop it It helps you

Enter http

HTTP classic

HTTP is the most popular understanding, don't worry about not remembering

The complete process of HTTP request return after browser enters URL

Network protocol layering

Classical five layer model
HTTP is the most popular understanding, don't worry about not remembering

The knowledge points we will cover in the following sections are application layer and transport layer.

  • Physical layer: the main function is to define how the physical device transmits data
  • Data link layer: establish data link connection between communication entities
  • Network layer: transport between nodes to create logical links

Transport layer

It aims to provide users with reliable end-to-end services. The data transmission process may involve fragmentation and subcontracting, as well as how the transmission was assembled in the past. This does not need to be done by developers. Therefore, the transport layer shields the details of data communication from the lower layer to the upper layer. Because of this, understanding the details of the transport layer allows us to implement a higher performance HTTP implementation

application layer

It helps us to implement the HTTP protocol, provides a lot of services for the application layer, and builds on the TCP protocol to shield the network transmission related details

Three handshakes of HTTP

HTTP only has the concept of request and response. Creating a connection belongs to the operation of TCP, while the request and response of connection are above the TCP connection. This is a point that beginners can easily confuse. In HTTP1.1, the connection can be maintained. This benefit is due to the overhead of HTTP handshake three times. In http2.0, requests can be concurrent in the same TCP connection, which greatly saves the cost of establishing a connection. Specific follow-up will be detailed, now back to HTTP three handshakes, as shown in the figure below
HTTP is the most popular understanding, don't worry about not remembering

First, the client sends a packet request to the server to create a connection, including a flag syn = 1 and seq = y.
Then the server will open a socket port of TCP and return a packet with flag bit syn = 1, ACK = x + 1 and seq = y
Finally, the client sends a packet with ack = y + 1 and seq = Z to the server

This is the whole process of HTTP’s three handshakes. The reason for the three handshakes is to prevent the server from opening some useless connections, because the network connection is delayed. If there is no third connection, the client closes the connection due to the network delay, and the server has been waiting for the client’s request to be sent, which results in a waste of resources. With three handshakes, the request can be confirmed There is no problem sending and responding to requests.

HTTP message

HTTP is the most popular understanding, don't worry about not remembering

The first line of the request message includes some request methods, resource address and HTTP protocol version.
The first line of response message includes protocol version, HTTP status code and the meaning of status code

HTTP method

Used to define actions on resources

  • HTTP methods: get, post, head, options, put, delete, trace and connect
  • Get: usually used to request the server to send some resources
  • Head: requests the header information of the resource, and these headers are consistent with those returned by the HTTP get method. A usage scenario of this request method is to obtain the size of a large file before downloading it, and then decide whether to download it, so as to save bandwidth resources
  • Options: used to get the communication options supported by the destination resource
  • Post: send data to server
  • Put: a representation used to add new resources or replace the target resources with the payload in the request
  • Delete: used to delete the specified resource
  • Patch: used for partial modification of resources
  • Connect: http / 1.1 protocol is reserved for proxy server that can change connection to pipeline mode
  • Trace: echo requests received by the server, mainly for testing or diagnostics

Reference: interviewer (9): may be the most comprehensive HTTP interview answer in the whole network

HTTP code

2XX success

  • 200 OK, indicating that the request from the client is correctly processed on the server side
  • 201 created request has been implemented, and a new resource has been created according to the requirements of the request
  • 202 accepted request has been accepted, but it has not been executed. It is not guaranteed to complete the request
  • 204 no content, indicating that the request is successful, but the response message does not contain the body of the entity
  • 206 partial content for range request

3xx redirection

  • 301 moved permanently, indicating that the resource has been assigned a new URL
  • 302 found, temporary redirection, indicates that the resource is temporarily assigned a new URL
  • 303 see other, indicating that there is another URL for the resource, and the get method should be used to obtain the resource
  • 304 not modified means that the server is allowed to access resources, but the condition is not met due to the request
  • 307 temporary redirect, temporary redirection, has the same meaning as 302

4xx client error

  • 400 bad request, the request message has syntax error
  • 401 unauthorized, indicating that the request sent needs to have authentication information authenticated by HTTP
  • The request to the server is denied access
  • 404 not found, indicating that the requested resource was not found on the server
  • 408 request timeout, client request timeout
  • 409 confirm, the requested resource may cause conflicts

5xx server error

  • 500 Internal sever error, which indicates that an error occurred on the server side while executing the request
  • 501 not implemented request is out of the server’s capability. For example, the server does not support a function required by the current request, or the request is a method that the server does not support
  • 503 service unavailable indicates that the server is temporarily overloaded or is down for maintenance and cannot process the request
  • 505 the HTTP version not supported server does not support or refuses to support the HTTP version used in the request

Create a simple node service through node

server.js

const http = require('http')

http.createServer(function(request, response) {
    console.log('request come',request.url)

    response.end('hello world')
}).listen(8888)

console.log('server.listening on 8888')

Terminal access to server.js File, execute node server.js Browser input localhost:8888 You can see ‘hello world’

HTTP features overview

Browser is the most common client. In order to ensure the security of data transmission, the browser has the same origin strategy. The so-called homology refers to the same domain name, protocol and port

Homology strategy can be divided into the following two types:

  • DOM homology policy: it is forbidden to operate dom of different source pages. The main scenario here is the cross domain situation of iframes. Iframes of different domain names are restricted from accessing each other
  • XMLHttpRequest homology policy: static use of XHR object to make HTTP requests to different servers

Understanding the role of the browser’s same source policy, if the same source makes a request, it will produce cross domain. However, in the actual development, we need to break through such limitations. There are several methods (there will be methods to practice later)

  • Jsonp: using the SRC tag of script is not restricted by the same source, creating script tag dynamically
  • CORS: server setting access allow origin
  • adopt window.name Cross domain
  • adopt document.domain
  • Through HTML5’s PostMessage

For details of cross domain knowledge, please refer to the front-end cross domain collation
Through the code to see what the specific is like

CORS cross domain

establish server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
    console.log('request come', request.url)

    const html = fs.readFileSync('test.html','utf8')
    response.writeHead(200, {
        'Content-Type': 'text/html'
    })
    response.end(html)

}).listen(8888)

server.js Created under the same directory hello.html , JS code is as follows (change the address to the IP address of your computer)

var xhr = new XMLHttpRequest()

xhr.open('GET','http://0.0.0.0:8887')

xhr.send()

Create server2.js in the same directory

const http = require('http')

http.createServer(function (request, response) {
    console.log('request come',request.url)
    response.end('hello world')
}).listen(8887)

console.log('server listening on 8887')

Start separately server.js And server2. JS localhost:8888

HTTP is the most popular understanding, don't worry about not remembering

Solution: add in server2.js

response.writeHead(200, {
    'Access-Control-Allow-Origin': '*'
})

Cross domain request successful
< font color = < font >:When we do not add a cross domain request header, we can find that the server side (that is, the terminal running server2. JS) can still receive the request, but the returned content is not received in the browser side. Therefore, the cross domain request is not sent, but the returned content is intercepted by the browser

CORS cross domain restriction and pre check request verification

modify hello.html , JS changed to

fetch('http://192.168.0.106:8887/', {
    method: 'POST',
    headers: {
        'Test-Cors': '123'
    }
})

Browser access localhost:8888 , appears

HTTP is the most popular understanding, don't worry about not remembering

What’s the reason? Listen to me
When the browser requests across domains, the default allowed method is
Get, head, post. Other methods are not allowed, and a pre check request is required

  • The allowed content type is
  • text/plain
  • multipart/form-data
  • application/x-www-form-urlencoded

Other types also require a pre check request
Other restrictions include that the header is shown in [default allowed header] (), no event listener is registered for the xmlhttprequestupload object, and the readablestream object is not used in the request. The latter two do not have much contact with each other, so we can not go into it
Back to the pre inspection request, let’s look at the figure below
HTTP is the most popular understanding, don't worry about not remembering

Note: the new version of chord browser has been changed. It can’t be seen in the network. Change the browser

If we need this request header, in server2. JS response.writeHead Add
'Access-Control-Allow-Headers': 'X-Test-Cors'
Similarly, if you need to add allowed methods, you can add
'Access-Control-Allow-Headers': 'Delete,PUT'
If we want to send a cross domain request within a certain period of time, we can send a pre check request in the response.writeHead Set in
'Access-Control-Max-Age': '100'

Jsonp cross domain

Remove server.js And modify the hello.html JS is
<script></script>
This is a simple cross domain jsonp. For details, please refer to the cross domain article above

Browser cache

In order to reduce requests and speed up page access. Developers can cache resources as needed. It is divided into strong cache and negotiation cache, which are set by HTTP header field

Strong cache

Expires is an absolute time, that is, the server time. The browser checks the current time and uses the cache directly if the expiration time is not reached
However, there is a problem with this method: the server time may not be consistent with the client time. Therefore, this field has been rarely used

The max age in cache control keeps a relative time. For example, cache control: Max age = 484200 means that the browser receives the file and the cache is valid within 484200s. If both cache control and expires exist, the browser always uses cache control first

Negotiation cache

Last modified is the field returned by the server when the resource is first requested, indicating the time of the last update. Next browser
The if modified since field is sent when a resource is requested. The server uses local last modified time and if modified since
Time comparison: if the time is inconsistent, the cache is considered to have expired and the new resource is returned to the browser; if the time is consistent, 304 status code is sent to the browser
Continue using cache

The entity ID (hash string) of the Etag resource. When the resource content is updated, Etag will change. The server will determine whether Etag sends changes
If changed, the new resource is returned, otherwise 304 is returned

Let’s take a closer look at cache control

  1. Cacheability

public、private、no-cache、no-store

  • Public means that the content returned from HTTP can be cached in any path (including proxy server and client browser)
  • Private means that only the requesting browser can cache it
  • No cache can be cached locally or in a proxy server, but this cache can only be used after server verification
  • No store completely disable caching, local and proxy servers do not cache, each time from the server
  1. expire

Refers to the cache time, the most commonly used is Max age, the unit is seconds, refers to how long the cache is valid
S-max-age this is the cache time of the proxy server, which only takes effect on the proxy server

  1. Revalidation

Must revalidate if the set cache has expired, you must go to the original server to request, and then re verify whether the data has expired
Proxy revalidate is applied to the proxy server cache
Now that the theory is over, let’s take a look at it through actual combat
modify test.html , JS part is changed to
<script></script>

modify server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
    console.log('request come',request.url)
    if (request.url == '/') {
        const html = fs.readFileSync('test.html', 'utf8')
        response.writeHead(200, {
            'Content-Type': 'text/html'
        })
        response.end(html)
    }

    if (request.url == '/script.js') {
            response.writeHead(200, {
                'Content-Type': 'text/javascript',
                'Cache-Control':'max-age=2020',
               // 'Last-Modified': '2020',
                //'Etag': '20200217'
            })
        response.end('console.log("script loaded")')
    }
    
}).listen(8888)

console.log('server start on the 8888')

Open the developer tool. We can see that after scripts are loaded for the first time, requests will be retrieved from the cache. Look at the yellow circle in the figure below. Note that the red tick should be removed
HTTP is the most popular understanding, don't worry about not remembering

Let’s look at the response header

HTTP is the most popular understanding, don't worry about not remembering

If the cache is not set, each request is fetched from the server. Need to verify, you can test by yourself

Cache hits can be seen here

HTTP is the most popular understanding, don't worry about not remembering

Last modified (Etag)

Now, we don’t really need to verify the resources, but to verify whether the browser will bring the verification header. Therefore, we can set the last modified and Etag as we like server.js Revised in response.writeHead

 response.writeHead(200, {
    'Content-Type': 'text/javascript',
    'Cache-Control':'max-age=2020, no-cache',
    'Last-Modified': '2020',
    'Etag': '20200217'
})

Start the service. The figure below shows the first request. You can see that the response header contains last modify and Etag

HTTP is the most popular understanding, don't worry about not remembering

After sending the request, you can see that if modified since and if none match appear in request headers

HTTP is the most popular understanding, don't worry about not remembering

This is not the end. When we verify the cache, if it has not expired, we want to take the cache directly, but let’s take a look at our response

HTTP is the most popular understanding, don't worry about not remembering

From the figure, we can see that there are still resources returned in the response, and the code code is 200. Why is this? The reason is very simple. We have not processed if modified since and if none match on the server server.jshttp.createServer Revised as

http.createServer(function (request, response) {
    console.log('request come',request.url)
    if (request.url == '/') {
        const html = fs.readFileSync('test.html', 'utf8')
        response.writeHead(200, {
            'Content-Type': 'text/html'
        })
        response.end(html)
    }

    if (request.url == '/script.js') {
        const etag = request.headers['if-none-match']
        if (etag === '20200217') {
            response.writeHead(304, {
                'Content-Type': 'text/javascript',
                'Cache-Control': 'max-age=2020,no-cache',
                'Last-Modified': '2020',
                'Etag': '20200217'
            })
            response.end('')
        } else {
            response.writeHead(200, {
                'Content-Type': 'text/javascript',
                'Cache-Control': 'max-age=2020,no-cache',
                'Last-Modified': '2020',
                'Etag': '20200217'
            })
            response.end('console.log("script loaded twice")')
        }
    }
    
}).listen(8888)

Whether we need to pass resources or not, we have to be at the end response.end Otherwise, the request never ended. After the modification, we can see that the request code code has changed to 304, and the time is shortened, but there are still resources in the response. What is the situation? At this time, we have successfully verified the cache and retrieved the cache resources. In the browser’s response, the browser will automatically display the obtained cache resources, which are not retrieved from the server. If you need to verify, you can go to the first response.end Add other content in the browser interface, and then see the response of the browser interface

Just now we asked the browser to do the negotiation cache because we set no cahce. We deleted the no cache. The browser should take the cache directly (because we set the max age = 2020). Before verification, we have to clear the browser’s cache on the page we just opened, then delete the no cache in the code, and refresh it repeatedly. You can see it script.js It’s from mercury cache. No store can also be verified by itself

Finally, let’s talk about last modify and Etag. Last modify allows us to take a time, the most data update, when we take out the database time.Etag When the data is taken out, make a data signature and store it in Etag

Cookies and sessions

HTTP is a protocol that does not save state, so we need an identity to prove who is accessing the server. Here we use cookies and sessions

  • Cookie features: through set cookie setting, the next request will be brought automatically, and in the form of key value pairs, you can set more than one
  • Cookie properties: Max age and expires set expiration time, httponly cannot pass document.cookie visit

Next, take a look at the cookie through the code server.js Revised in response.writeHead

{
    'Content-Type': 'text/html',
    'Set-Cookie': ['id=123;max-age=2','time=2020']
}

After starting the service, you can see two cookies in the cookie in the application or the interface in the network. The cookie id = 123 has set the expiration time. If you refresh it later, you can see that the cookie id = 123 disappears

As mentioned above, cookies are not shared across domains. However, if I want to share cookies with secondary domain names under the first level domain name, I can set the document.domain The details are as follows


{
    'Content-Type': 'text/html',
    'Set-Cookie': ['id=123;max-age=2','time=2020;domain=test.com']
}

After modification, you can add host self verification

Long HTTP connection

Long connection refers to whether to close the TCP connection after a request is completed. If the TCP connection is always open, there will be a certain amount of resource consumption. However, if there are still requests, you can continue to send them on the current TCP connection. In this way, you don’t need to shake hands again and save time. In fact, the website has a large amount of concurrency, so it keeps a long connection, and a long connection can set a timeout. If no request is sent in this time, the connection will be closed

Next, we can analyze the actual scenario. Take Baidu home page as an example, open the developer panel, and then in network, right-click the name attribute and check connection ID
We can see that most of the connections are multiplexed. In HTTP1.1, the maximum number of TCP connections under a domain name is 6 (chord). Therefore, at the beginning, six connections will be created at once, and the subsequent requests will reuse these connections.

To verify this part of the content through the code, first create a test.html

<body>
    <img>
    <img>
    <img>
    <img>
    <img>
    <img>
    <img>
    <img>
</body>  

newly build server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
    console.log('request come',request.url)
    const html = fs.readFileSync('test.html', 'utf8')
    const img = fs.readFileSync('timg.jpg')
    if (request.url === '/') {
        response.writeHead(200, {
            'Content-Type': 'text/html',
            // 'Connection': 'close'
        })
        response.end(html)
    } else {
        response.writeHead(200, {
            'Content-Type': 'image/jpg',
            // 'Connection': 'close'
        })
        response.end(img)
    }
    
}).listen(8888)

console.log('server start on the 8888')  

Start service
HTTP is the most popular understanding, don't worry about not remembering

Take a look at waterfall, the time sharing process of network requests. If you need to close a long connection, the value of connection can be written as close
Let’s just mention http2.0. Now we use channel multiplexing technology. We only need to create a TCP connection, and all requests in the same domain can be concurrent. If you want to use http2.0, you need to ensure that the HTTPS protocol is requested, and the back-end needs to be changed greatly. Therefore, the use of http2.0 has not been widely used

Redirect

When we access a resource through the URL, the resource is no longer the location specified by the URL. The server should inform the client of the current location of the resource, and the browser will request the resource.
Look at the code, create a new one server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
    console.log('request comme', request.url)
    if (request.url === '/') {
        response.writeHead(302, {
            'Location': '/new'
        })
        response.end()
    }
    if (request.url === '/new') {
        response.writeHead(200, {
            'Content-Type': 'text/html'
        })
        response.end('<div>hello world</div>')
    }
}).listen(8888)

console.log('server listening on 8888')

The test here is in the same domain, so only one route is written. If it is not the same, the real address will be replaced/new. start service, enter localhost:8888 After that, it will jump directly to the real location of the resource, and you can also view it in the network. Besides the icon, there are two requests.

The code we wrote is 302. If we change it to 200, we will find that there is no way to redirect. 302 is temporary redirection, 301 is permanent redirection, as we have said before. If we change the above 302code code to 301, we will find it in the terminal, except for the first time, no matter what we input later localhost:8888 How many times, the terminal printing request only has to reset the backward request, just because the browser remembers that the original address has been permanently redirected, so it will not send a request to the original path. In the actual development, we should be careful to use permanent redirection, because once the permanent redirection is done, the directed resource path will be retained for as long as possible in the browser, and the original path will not be requested

Conclusion

The purpose of this sharing is to deepen some knowledge points we know through code, and sort out the context of HTTP knowledge. I hope it can be helpful to some small partners. If you like my style of writing, I will bring some practical operations of the web server nginx. In the actual development, we will use nginx as proxy and some cache. Therefore, as an HTTP service, mastering it is also indispensable