The process of the browser from entering the web address to displaying the page

Time:2021-11-25

Warehouse address of complete high frequency question bank:https://github.com/hzfe/aweso…

Complete high frequency question bank reading address:https://febook.hzfe.org/

Answer key points

URL DNS TCP Render

The browser from inputting the web address to rendering the page is mainly divided into the following processes

  • URL input
  • DNS resolution
  • Establish TCP connection
  • Send http / HTTPS request (establish TLS connection)
  • Server response request
  • Browser parsing rendered pages
  • The HTTP request ends and the TCP connection is disconnected

In depth knowledge

1. URL input

The process of the browser from entering the web address to displaying the page

URL address

URL (uniform resource locator) is used to locate resources on the Internet, commonly known as web address.

We enter hzfe’s official website hzfe.org in the address bar and press enter. The browser will judge the entered information as follows:

  1. Check whether the input content is a legal URL link.
  2. If yes, judge whether the entered URL is complete. If it is incomplete, the browser may guess the domain and complete the prefix or suffix.
  3. No, take the input content as the search criteria and use the default search engine set by the user to search.

Most browsers will start from history, bookmarks and other places to find the URL we enter and give intelligent tips.

2. DNS (domain name system) resolution

Because the browser cannot directly find the corresponding server IP address through the domain name, DNS resolution is required to find the corresponding IP address for access.

The DNS resolution process is as follows:

The process of the browser from entering the web address to displaying the page

  1. Enter the domain name hzfe.org in the browser. The operating system checks whether there is a record of this web address in the browser cache and the local hosts file. If there is, find the corresponding IP address from the record to complete the domain name resolution.
  2. Find out whether there is this URL record in the local DNS resolver cache. If there is, find the corresponding IP address from the record to complete domain name resolution.
  3. Use the DNS server set in the TCP / IP parameters to query. If the domain name to be queried is included in the local configuration area resource, the resolution result is returned to complete the domain name resolution.
  4. Check whether the local DNS server caches the URL record. If yes, return the resolution result to complete the domain name resolution.
  5. The local DNS server sends a query message to the root DNS server. After receiving the request, the root DNS server responds with the top-level domain DNS server address.
  6. The local DNS server sends a query message to the top-level domain DNS server. After receiving the request, the top-level domain DNS server responds with the authoritative DNS server address.
  7. The local DNS server sends a query message to the authoritative DNS server. After receiving the request, the authoritative DNS server responds with the IP address of hzfe.org to complete the domain name resolution.

The query usually follows the above process. The query from the requesting host to the local DNS server is recursive, and the query process for the DNS server to obtain the required mapping is iterative.

3. Establish TCP connection

Almost all HTTP communications in the world are carried by TCP / IP. TCP / IP is a common packet switching network layer used by computers and network devices all over the world. An HTTP connection is actually a TCP connection and its usage rules HTTP authoritative guide

After the browser obtains the IP address of the server, the browser will use a random port (1024 < port < 65535) to send a TCP connection request to port 80 of the server (Note: the default protocol of HTTP is port 80, and HTTPS is port 443). After the connection request reaches the server, a TCP connection is established through three TCP handshakes.

3.1 layered model

----------------------------------
  7|     application layer    |           |    HTTP   |

  6|     Presentation layer    |    application layer    |

  5|     Session layer    |           |         |
    ---------------------------------
  4|     Transport layer    |    Transport layer    |  TCP   TLS  |
    ---------------------------------
  3|     network layer    |    network layer    |    IP     |
    ---------------------------------
  2|    data link layer
               |     link layer
  1|     physical layer
    --------------------------------
       [OSI]   |   [TCP/IP]

3.2 TCP triple handshake

#   SYN   Is the handshake signal when establishing a connection, TCP   Send first in   SYN   The of the package is the client and the received is the server
#In TCP, when the sender data reaches the receiver, the receiver returns a notification of the received message. This message is called acknowledgement ack

  Suppose there is client a and server B. We need to establish reliable data transmission.
      SYN(=j)        //  SYN:   A   Request to establish connection
  A ----------> B
                |
     ACK(=j+1)   |   //  ACK:   B   Confirmation response   A   of   SYN
     SYN(=k)     |   //  SYN:   B   Send a   SYN
  A <-----------
  |
  |  ACK(=k+1)
   ----------->   B    //  ACK:   A   Confirmation response   B   My bag
  1. The client sends syn package (SEQ = J) to the server and enters syn\_ Send status, waiting for server confirmation.
  2. When the server receives the syn packet, it must confirm the syn of the customer (ACK = K + 1), and send a syn packet (SEQ = k), that is, syn + ACK packet. At this time, the server enters syn\_ Recv status.
  3. The client receives the syn + ACK packet from the server and sends a confirmation packet ack (ACK = K + 1) to the server. After the packet is sent, the client and the server enter the established state and complete three handshakes.

4. TLS negotiation

The process of the browser from entering the web address to displaying the page

TLS negotiation

After the connection is established, data can be transmitted through HTTP. If HTTPS is used, an additional layer of protocol will be added between TCP and HTTP for encryption and authentication services. HTTPS uses SSL (secure socket layer) and TLS (Transport Layer Security) protocols to ensure the security of information.

  • SSL
  • Authenticate users and servers to ensure that data is sent to the correct client and server.
  • Encrypt data to prevent data from being stolen halfway.
  • Maintain data integrity and ensure that data is not changed during transmission.
  • TLS
  • Used to provide confidentiality and data integrity between two communication applications. The protocol consists of two layers: TLS record protocol (TLS record) and TLS handshake protocol (TLS handshake). The lower layer is TLS recording protocol, which is located above a reliable transmission protocol (such as TCP).

4.1 TLS handshake protocol

The process of the browser from entering the web address to displaying the page

  1. The client sends a client Hello message, which carries the following information: list of supported SSL / TLS versions; Supported and encryption algorithms; Supported data compression methods; Random number a.
  2. The server responds to a server Hello message, and the information carried includes: the SSL / TLS version number adopted in the negotiation; Session ID; Random number B; Server digital certificate serverca; Due to the requirement of two-way authentication, the server needs to authenticate the client and will send a client certificate request at the same time, indicating that the client’s certificate is requested.
  3. The client verifies the digital certificate of the server; After passing the verification, send the random number C, which is called pre master key, and send it after encryption with the public key in the digital certificate; Since the server initiates the client certificate request, the client encrypts a random number with the private key, and the clientrandom is sent along with the client’s certificate clientca.
  4. The server verifies the certificate of the client and successfully decrypts the random number clientrandom encrypted by the client; Generate the dynamic key master key according to the random number A / random number B / random number C (pre master key), encrypt a finish message and send it to the client.
  5. The client generates the master key according to the same random number and algorithm, encrypts a finish message and sends it to the server.
  6. The server and client are decrypted successfully respectively. So far, the handshake is completed, and the subsequent data packets are encrypted and transmitted by master key.

5. Server response

When the connection between the browser and the web server is established, the browser will send an initial HTTP get request, and the request target is usually an HTML file. After receiving the request, the server will send back an HTTP response message, including relevant response header and HTML body.

<html>
 <head>
  <meta charset="UTF-8"/>
  < title > my blog < / Title >
  <link rel="stylesheet" src="styles.css"/>
  <scrIPt src="index.js"></scrIPt>
</head>
<body>
  <h1   Class = "heading" > Home Page</h1>
  <p>A paragraph with a <a href="https://hzfe.org/">link</a></p>
  <scrIPt src="index.js"></scrIPt>
</body>
</html>

5.1 status code

The status code is composed of 3 digits. The first digit defines the category of response, and there are five possible values

  • 1XX: instruction information – indicates that the request has been received and processing continues
  • 2XX: successful – indicates that the request has been successfully received, understood and accepted
  • 3xx: redirection – further action is required to complete the request
  • 4xx: client error – the request has syntax error or the request cannot be implemented
  • 5xx: server side error – the server failed to implement the legal request

5.2 common request headers and fields

  • Cache control: must revalidate, no cache, private (whether cache resources are required)
  • Connection: keep alive
  • Content encoding: gzip (returned content compression encoding type supported by the web server)
  • Content-Type:text/html; Charset = UTF-8 (file type and character encoding format)
  • Date: sun, 21 SEP 2021 06:18:21 GMT
  • Transfer encoding: chunked (resources sent by the server are sent in blocks)

5.3 HTTP response message

The response message consists of four parts (response line + response header + blank line + response body)

  • Status line: http version + space + status code + space + status code description + carriage return (CR) + line feed (LF)
  • Response header: field name + colon + value + carriage return + line feed
  • Blank line: carriage return + line feed
  • Response body: added by user, such as the body of post

6. Browser parsing and drawing

The rendering process of different browser engines is different. Here, take the rendering method of Chrome browser as an example.

The process of the browser from entering the web address to displaying the page

  1. Process HTML tags and build DOM trees.
  2. Process CSS tags and build cssom trees.
  3. Merge Dom and cssom into a rendering tree.
  4. Layout according to the rendering tree to calculate the geometric information of each node.
  5. Draw each node onto the screen.

7. TCP disconnect

In order to optimize the time-consuming of requests, the current page will open a keep alive connection by default. The exact time for closing a TCP connection is when the tab is closed. This closing process isFour waves。 Closing is a full duplex process, and the order of contracting is not necessarily. Generally speaking, the shutdown is initiated by the client. The process is shown in the following figure:

The process of the browser from entering the web address to displaying the page

  1. The active closing party sends a fin to close the data transmission from the active party to the passive Closing Party, that is, the active Closing Party tells the passive closing party that I will no longer send you data (if the data sent before the fin packet does not receive the corresponding ack confirmation message, the active Closing Party will still resend these data), However, at this time, the active shutdown party can also accept data.
  2. After receiving the fin packet, the passive closing party sends an ACK to the other party and confirms that the serial number is the received serial number + 1 (the same as syn, one fin occupies one serial number).
  3. The passive closing party sends a fin to close the data transmission from the passive closing party to the active Closing Party, that is, to tell the active closing party that my data has been sent and will not send data to you again.
  4. After receiving fin, the active closing party sends an ACK to the passive closing party to confirm that the serial number is received serial number + 1. So far, it has completed four waves.

reference material

  1. How_browsers_work
  2. DOMTokenList
  3. Illustrated SSL / TLS protocol
  4. DNS domain name system

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]