Talk about chrome: from entering URL to page presentation

Time:2021-8-13

3000 word long text warning~

Level 1: Chrome’s multi process architecture:

Distinction between concurrency and parallelism

  • Concurrency: the ability to handle multiple tasks
  • Parallel: ownmeanwhileAbility to handle multitasking

Distinction between process and thread:

As a science student, first share the most impressive statement I have heard in the OS class:Process is the smallest unit of resource allocation; Thread is the smallest unit of task scheduling.

  • Threads must depend on processes to exist. Threads within the same process share process resources.
  • If a thread in a process crashes, the process crashes. But it will not affect the operating system.
  • Inter process communication through IPC mechanism: shared memory, socket and pipeline communication (through kernel)

Multithreading architecture implementation of chrome:

  • Main process × 1: User interaction, sub process management, storage management.
  • Network process × 1: Resource loading
  • GPU process × 1: 3D rendering
  • Rendering process xn: responsible for document parsing and sub resource loading (one is created for each page)
  • Plug in process × n: Because the plug-in is unstable, it needs to be isolated from other processes.

The second level: four layer network model of TCP / IP protocol

  • Comparison between OSI seven layer model and TCP / IP four layer model

    Talk about chrome: from entering URL to page presentation

    My understanding is that OSI is more focused onNormativeAnd TCP / IP isat presentIn Network EnvironmentBest practices

    In fact, in undergraduate textbooks, it is abstracted as a five-tier model:

    Application layer (application layer + presentation layer + session layer) = > transport layer = > network layer = > link layer = > physical layer

    It can be seen that the specific layering method is not the key, but whether the important functions corresponding to the middle column in the figure above have been realized.

  • Packet size:

    This is a story that every undergraduate teacher is very good at telling. At that time, I didn’t understand the basic operation of computer network quickly because of this story, but up to now, this example has well guided my understanding of computer network.

    Talk about chrome: from entering URL to page presentation

The third level: domain name system (DNS)

Pre content

  • DNS task: query the corresponding IP address for a given URL.
  • DNS can be queried in two ways: iterative / recursive;

    1. The browser queries the local DNS server in a recursive manner;
    2. The local DNS server queries other servers in an iterative manner;
  • DNS uses UDP protocol for query: fast and efficient.

Query process:

  1. The browser sends the URL to the local DNS server (ldns);
  2. The local DNS server looks up the local cache. If there is a record of the URL and it has not expired, the IP address is returned, and the DNS query process ends;
  3. If there is no record corresponding to the URL in the local DNS server, iterate to find the IP address;
  4. The local DNS server first sends a DNS message to the root domain name server, which returns the corresponding top-level domain name server address according to the URL requested in the message
  5. The local DNS server sends a DNS message to the top-level domain name server, and the top-level domain name server returns the corresponding authoritative server address according to the URL requested in the message.
  6. Repeat the above process until you get the IP address corresponding to the final URL.
  7. The local DNS server writes the record of this [IP, url] to the cache and returns the IP.

The fourth level: SSL Protocol – HTTPS

What is HTTPS?

HTTPS (HTTP secure) = http + mixed encryption + authoritative authentication + integrity assurance

Talk about chrome: from entering URL to page presentation

Since the detailed introduction of HTTPS in the network is very rich, it is not described in detail here, but directly to the conclusion.

SSL protocol operation mechanism

There are two statements on the network (3 random numbers to generate keys or 2 random numbers to generate keys), but they are similar. They all adopt the method of symmetric encryption + asymmetric encryption:

  1. Client sends to server: encryption suite list + random number client_ random
  2. Server returns to client: selected encryption suite + random number server_ Random + digital certificate (with public key)
  3. The client verifies the validity of the digital certificate. If it is legal, it generates a random number pre master and encrypts it with the public key in the certificate. Send encrypted packets to server
  4. The server decrypts the packet using the private key to obtain the pre master
  5. At this time, both parties use the agreed encryption suite to encrypt the client_ random+server_ Random + pre master encrypts and generates a symmetric key. After that, the symmetric key can be used for encrypted communication

Talk about chrome: from entering URL to page presentation

Level 5: TCP protocol

TCP protocol features

Connection orientedreliableTransport layer protocol.

Three handshakes

  • The fundamental purpose of the three handshakes:Confirm whether the sending and receiving capabilities of our party and the other party are normal
  • Three handshake processes:

    Both client and server know that their sending and receiving capabilities are normal, and the sending and receiving capabilities of the other party need to be determined.

    1. First handshake: client sends message to server: syn = 1, SEQ = x;
    2. The Second Handshake : the server receives the message and sends the message to the client: ack = 1, ACK = x + 1, syn = 1, SEQ = y;

      The server confirms that the sending ability of the client is normal: the client sent a syn message.

    3. Third handshake: the client receives the message and sends the message to the server: ack = 1, SEQ = x + 1, ACK = y + 1.

      The client confirms that the sending and receiving capabilities of the server are normal: because the server correctly receives and responds to the client.

      The server confirms that the receiving ability of the client is normal: the server has received the ack of the client.

  • Why not use two handshakes:

    If the last handshake is discarded, the server side cannot determine whether the client’s receiving capacity is normal.

Four waves

  • The purpose of four handshakes is to ensure that the data of both parties are sent, and then close the connection.
  • Four wave processes:

    1. First wave: the client sends fin message to the server, indicating that the client will no longer send data to the server:FIN=1,seq=x
    2. Second wave: the server sends an ACK message to the client to respond:AKC=1,seq=y,ack=x+1

      During this period, the server can continue to send data to the client. At this time, it is in the semi connected state of TCP;

      The client also needs to continue to listen to the messages sent by the server.

    3. Third wave: the server sends fin message to the client, indicating that the server will no longer send data to the client:FIN=1,seq=z,ack=x+1
    4. Fourth wave: the client sends an ACK message to the server to respond:ACK=1,seq=x+1,ack=z+1。 Wait for 2msl at the same time. If no message is received during this period, the TCP connection will be disconnected smoothly.
  • Why four waves instead of three when establishing a connection?

    This is because the server receives a message in the listen stateEstablish connection requestAfter the syn message, put ACK and syn inA messageSend to the client. When the connection is closed, when the fin message of the other party is received, it only means that the other party no longer sends data but can still receive data. It is necessary to decide whether to close the sending data channel nowUpper applicationTherefore, your ACK and fin will generallyseparatesend out.

  • Why wait for 2msl?

    1. In the fourth handshake, the client sends an ACK but does not receive a response, so the client cannot confirm whether its ack can arrive smoothly, so it needs to wait for 2msl to give the server the opportunity to resend fin message.
    2. Within 1msl after the client sends the ACK, the timer of the server will also expire. If the ACK is not received for some reason, the fin message will be retransmitted.
    3. The fin file will reach the client within 1msl; At this time, the timer of the client has not been cleared. After receiving fin, restart the 2msl timer and send back ack.

Level 5: ARP Protocol

definition

  • Address resolution protocolResolve the IP address to find the MAC addressIs a very important network transmission protocol in the network protocol package. ARP is a link layer protocol.

working process

  • Same network segment: Broadcast query – > unicast response
  • Different network segments: Broadcast query – > unicast response – > gateway transfer – > Duplicate
  • reference resources:https://juejin.cn/post/6890167829984149518

What happens from entering the URL to the page presentation?

Browser side:

  1. User input URL

    The browser determines whether the string in the address bar conforms to the URL naming rules. If not, submit the string to the search engine for processing; If reasonable, proceed to the second step.

  2. Build HTTP message

    GET url HTTP/1.1

    The main process passes the message to the network process through IPC for processing.

  3. Find file cache in browser

    If the browser has requested the resource and the resource has not expired, the network process will return the cached file and intercept the HTTP request. If the lookup cache misses, proceed to step 4.

  4. DNS query

    For details, see the second pass DNS;

    The resulting IP is eventually returned to the browser’s network process.

  5. After the HTTP message is processed at the transport layer, it is ready to be dropped to the transport layer (TCP).

    Chrome browser has restrictions on TCP connections (the same domain name can maintain up to 6 TCP connections). If more than 6, it needs to be put into the TCP queue and wait.

  6. If HTTPS protocol is used, an SSL layer shall be added between application layer HTTP and transport layer TCP to ensure connection security

    See HTTPS above for details.

  7. Three handshake connections of TCP:

    For details, see the fourth pass TCP protocol

  8. Transport layer TCP protocol processing message
    • The TCP layer divides the HTTP message into message segments of equal size and addsTCP header
    • Add serial number to the header: guaranteeSequential deliveryAnd reassemble at the destination.
    • Add the port number of the source end and the port number of the destination end in the header: confirm which application (port) the packet should be delivered to.
    • Drop message segments to the network layer
  9. Network layer IP protocol processing message segment
    • Add I for message segmentP head, add the source IP address and destination IP address.
    • Use static routing algorithm or dynamic routing algorithm to route at the network layer
  10. Link layer transmission message
    • Add for IP packetsEthernet header, added MAC address
    • Apply ARP protocol for link layer data transmission: see Chapter 6 for details

Server side:

  1. Link layer and network layer:

    Remove the Ethernet header and IP header at one time and submit to the transport layer

  2. The transport layer reorganization message is delivered to the specified application
    • TCP protocol ensures the correctness and integrity of data according to the serial number of message segment header to form HTTP message
    • Deliver the HTTP message to the destination port number program specified in the TCP header and submit it to the application layer
  3. The client analyzes the HTTP request and constructs the response message
    1. For the URL specified in the request line, see if redirection is required.

      If redirection is required, return 301 (permanent redirection) or 302 (temporary redirection) status code, and write the redirection address in the location field of the header of the response message.

    2. Check whether the resources corresponding to the if none match field in the header of the request message are updated

      If 304 status code is returned for update, it indicates that the resource has not been updated.

    3. See if the browser needs to be notified for caching

      If the browser needs to be updated, set cache control: Max age = 2000 (take 2000 seconds as an example)

  4. The server application layer sends the HTTP message back to the client

    The process is the same as above.

  5. Close HTTP connection

    Check to see if you are on a long connection

    • In HTTP 1.0, long connections are declared through connection: keep alive; otherwise, short connections are used by default; Default long connection in HTTP 1.1)
    • If a long connection is used, the HTTP connection will not be closed temporarily; If a short connection is used, perform four HTTP wave processes (see the fifth level for details)

Browser side:

  1. The network process receives the HTTP response message for analysis
    1. If the response status code is 301 or 302, rebuild the HTTP request message, and the URL is the URL corresponding to the location of the response message
    2. If the response code is 304, the browser’s cache resources are directly used
    3. If the status code is 200, the request is successful. Process according to resource type.
  2. Process the corresponding data according to the content type
    • If the content type is HTML / text, it is an HTML page. The main thread of the browser is notified through IPC to prepare to render the page.
    • If the content type is a byte stream type, use the download manager to download resources.
  3. Render the page and display it

    This part is very complex (including the rendering process of the browser and the execution mechanism of JavaScript)

    Please see the next blog share~