HTTP correlation

Time:2019-9-12

1. DNS query to get IP

  • Why IP: TCP/IP determines the communication object by IP address.
  • Reasons for the combination of domain name and IP address:

    • IP address occupies less memory. IP address length is 32 bit (4 bytes), domain name needs tens or even 255 bytes.
    • IP addresses are hard to remember.
    • People use names, routers use IP addresses.
  • The IP parsed by DNS is only the IP of the entity host, not the IP address of the web application server to be accessed (e.g. the virtual host, which forwards the connection to the corresponding virtual host according to the domain name).
  • The structure of TCP/IP:

    • TCP/IP is a large network composed of small subnetworks connected by routers.
    • All devices in the network will be assigned an address, which is the IP address. Through the IP address, you can determine which server the message should be sent to.
    • Message sent by sender – > (through) hub in subnet – > (forwarded to) nearest router – > locate the next router according to the destination of the message – > send the message to the next router (repeated until the destination)
  • Actual IP address:

    • It is a series of 32 bits of numbers, divided into four groups according to 8 bits (1 byte), expressed in decimal system, separated by dots.
    • In the rule of IP address, the network number and the host number are linked up to a total of 32 bits, but the specific structure of these two parts is not fixed.
    • So we need to use the subnet mask. The format of the subnet mask is a series of 32 bits with the same length as the IP address. The left half is 1, and the right half is 0. The part of subnet mask 1 represents the network number, and the part of subnet mask 0 represents the host number. All 0: Represents the entire subnet. Full 1: Represents sending packets to all devices on the subnet, i.e. broadcasting.
  • Query order:

    • Browser Cache
    • Local cache
    • Local host file
    • Query DNS domain name server
  • Optimization: using dns-prefetch optimization
  • The process of querying the DNS domain name server: (for example, http://www.a.b.com)

    • Root domain servers (limited number of servers). No, tell it the IP address of.Com)
    • Top-level domain name server (if not, tell it the IP address of http://b.com)
    • Secondary domain name server (if not, tell it the IP address of http://a.b.com)
    • Tertiary domain name server (return IP address of http://www.a.b.com)
  • Accelerate DNS server response by caching:

    • Real Internet: A DNS server can manage information in multiple domains.
    • DNS server has caching function: it does not need to search from the root domain, it can directly return the response through caching, and the next query can start downward from the cached location. Caching can reduce query time compared to finding the root domain every time.
    • After the information is cached, the original registration information may change. At this time, the information in the cache may be incorrect. Therefore, the cache information will set the validity period. When the information in the cache exceeds the validity period, the data will be deleted from the cache.
  • Use commands to view the entire DNS request process: http://www.ruanyifeng.com/blo…

2. HTTP hijacking:

  • Divided into DNS hijacking and content hijacking
  • DNS hijacking:

    • The DNS server receives an attack, returns a false IP address or invalidates the request without any processing. The ultimate effect is that a particular network cannot access or access fake sites.
    • Solution: (DNS hijacking is achieved by attacking the operator’s parsing server)

      • Use your own parsing server or
      • Send the resolved domain name as IP in your App
  • Content hijacking:

    • Background:

      • Operators in order to speed up user access, reduce their own traffic loss to do a caching mechanism.
      • The user requests data and returns it directly if it is in the cache pool.
      • If not, a request is made to the server to intercept the returned data, store it in the cache pool, and then return it to the user.
    • Generation: Malicious tampering with the cache content of the server
    • Solution: This is not much, there is no good solution at present.

3. TCP/IP requests

  • Shake hands three times (establish connections)

    • SYN
    • ACK ,SYN
    • ACK
  • Four waves (disconnect)

    • FIN, ACK
    • ACK
    • FIN,ACK
    • ACK
  • TCP slow start:

    • TCP slow start

      • After three handshakes of TCP, how to send a large number of datagrams at the beginning can easily lead to the exhaustion of routing buffer space and congestion in the network. Therefore, according to the size of the window in the accident, the amount of data gradually increases. The window is initialized to a maximum size of the message segment. Whenever a message segment is confirmed, the window presents. Exponential growth.
    • congestion avoidance

      • Windows (cwnd) can not always increase to the slow start threshold (ssthresh) of TCP. The slow start phase ends and begins to avoid congestion. The size of windows increases linearly.
      • How to judge congestion? If the sender does not receive the receiver’s ACK within the time interval, the network is considered congested.
      • After congestion occurs: the start threshold will be the general of the current window; reset the current window to 1, and re-enter the slow start process.
    • Rapid retransmission

      • The receiver successfully accepted M1 and M2 sent by the sender and sent ACK respectively. Now the receiver did not receive M3, but received M4. Obviously, the receiver could not confirm M4, because M4 is an out-of-order message segment. If the receiver does nothing according to the principle of reliable transmission, but according to the fast retransmit algorithm, when receiving M4, M5 and other message segments, it sends M2 ACK repeatedly to the sender. If the receiver receives three repeated ACK in a row, then the sender does not have to wait for the expiration of the retransmit timer, because the sender retransmits as soon as possible. Unconfirmed message paragraph.
      • Set the slow start threshold to half of the current window; set the current window to the slow start threshold; re-enter the congestion avoidance phase.
    • Fast recovery

      • The principle of data packet conservation for fast recovery is that the number of data packets in the network is constant at the same time. Only after the old data packets leave, can the new data packets be sent to the network. If the sender receives a duplicate ACK, the ACK mechanism of TCP indicates that there is a packet leaving, at which time CWnd plus 1.
      • When receiving three duplicate ACKs, set ssthresh to half of cwnd, set CWnd to the value of ssthresh plus 3, and then retransmit the missing message segment. The reason for adding 3 is that three duplicate ACKs are received, indicating that three “old” packets have left the network.
      • When a duplicate ACK is received, the congestion window is increased by 1.
      • When ACK receives a new packet, set CWnd to the value of ssthresh in the first step. The reason is that the ACK confirms that new data has been received from repeated ACK. The recovery process is over and can return to the state before recovery, that is, to the state of congestion avoidance again.
  • UDP:

    • UDP, also known as user data protocol, belongs to the network transport layer.
    • UDP provides transaction-oriented, simple and unreliable information transmission services.

      • Transactions: Atomicity (either occurs or does not occur), persistence (as long as the transaction is committed, its role is permanent), isolation (independent between transactions), consistency (consistent state before and after transactions)
    • Because of no connection, low resource consumption, fast and flexible processing, it is often used in scenarios where losing one or two data packets will not have a significant impact, such as audio and video.
    • DNS service is based on it.
    • The upper limit determinants of UDP sending datagrams:

      • UDP protocol itself, UDP protocol has 16 bits of UDP message length, then UDP message length can not exceed 2 ^ 16 = 65536;
      • The length of Ethernet data frame and the maximum transmission unit (MTU) of data link layer;
      • UDP sending buffer size of socket;
      • Using UDP protocol on the Internet, the maximum number of bytes per datagram is 576 (MTU value is 576 bytes under the Internet) – 20 (IP protocol itself takes 20 bytes of package header) – 8 (UDP header takes 8 bytes) = 548.
  • Differences between TCP and UDP:

    • Connections are established in different ways:

      • UDP is not connection-oriented, TCP is connection-oriented, and all sessions are completed based on connection.
      • TCP is a connection-oriented, reliable and orderly transport layer protocol. UDP is a data-oriented, unreliable and disorderly transport protocol.
      • Just like sending a short message, UDP only needs to know the IP address of the other party and send the data one by one. Others, as senders, do not need to care.
      • In UDP, one socket can communicate with multiple UDPs; in TCP, one socket can only communicate with one TCP; each TCP connection can only be point-to-point; UDP supports one-to-one, one-to-many, many-to-one and many-to-many interactive communications;
    • Data is sent in different ways:

      • TCP is a protocol based on the connection between two terminals, so there is no limit on the size of the data stream sent in theory. If a large piece of data is sent by TCP, it may be truncated into several segments and received by the recipient in turn.
      • UDP itself sends a single datagram, so there is an upper limit.
    • Differences in data orderliness:

      • TCP guarantees orderliness, UDP does not guarantee orderliness.
      • For TCP, TCP itself has a series of complex algorithms, such as overtime retransmit, error retransmit, and so on, to ensure that TCP data is orderly. Assuming that you send data 1, 2, 3, as long as the sender and the receiver remain connected, the data received by the receiver will always be 1, 2, 3.
      • UDP protocol is much more liberal. No matter how big the buffer pool is on the server side, the messages sent by the client side are always received one by one. And because of the unreliability and disorder of UDP itself, if the client sends three data reports: 1, 2 and 3, the server may receive any combination of three data reports in any order and number.
    • Different reliability:

      • TCP itself is a reliable protocol, UDP is not a reliable protocol.
      • Many of the algorithms in TCP make it reliable to keep the connection. For example: TCP overtime retransmit, error retransmit, TCP traffic control, congestion control, slow hot start algorithm, congestion avoidance algorithm, fast recovery algorithm and so on. So TCP is a protocol with complex internal principles, but relatively simple to use.
      • UDP is a non-connection-oriented protocol. Every datagram sent by UDP has its own IP address and the IP address of the receiver. It does not care whether the datagram is wrong or not, as long as it is sent out.
    • TCP is oriented to byte stream, in fact, TCP regards data as a series of unstructured byte streams; UDP is oriented to message; UDP has no congestion control, so network congestion will not reduce the transmission rate of source host (useful for real-time applications, such as IP phone, real-time video conferencing, etc.).
    • TCP has a header cost of 20 bytes, UDP has a header cost of only 8 bytes.
    • Use scenarios:

      • Scenarios using UDP: High real-time requirement, multi-point communication;
      • Real-time requirements: such as real-time meetings, real-time video, in this case, if TCP is used, when the network is not good to retransmit, the picture will certainly be delayed, or even more heap. If you use UDP, even if you occasionally lose several packages, it will not affect anything. In this case, UDP is better.
      • Multipoint communication: TCP needs to maintain a long connection, so when it comes to multipoint communication, it must establish its two-way connection with multiple communication nodes. Then sometimes in NAT environment, it is not easy for two communication nodes to establish their direct TCP connection. UDP can be sent directly without maintaining the connection. Yes, so the cost will be very low and the penetration will be good. It’s also true to use UDP in this case.

4. Five-tier Intel stack:

  • Application layer (dns, https)
  • Transport layer (tcp, udp)
  • Network Layer (ip, arp) IP Address [Understanding this, https://blog.csdn.net/wenqian…]
  • Link layer (ppp) encapsulated into frames
  • physical layer

5. HTTP requests:

  • Composition:

    • The request message is composed of the request header and the request body. The request header contains the request line (method, URL, protocol) and the request header field.
    • Response message is composed of response header and response body. Response header includes state line (protocol, state code, state code reason phrase) and response header field.
    • After the head, it is separated by a blank line (two newline characters);
  • What are the common request headers and response headers?

    • Request header:

      • Accept-Encoding/Accept-Language/Accept-chart
      • connection:keep-Alive/close
      • Referer: The original URL of the request
      • Origin: Request protocol and domain name
      • Host: protocol and domain name for sending destination
      • Cookie: cookie value
      • If-modified-since: last-modified for the corresponding server
      • If-no-match: Etag for the corresponding server
      • Cache-control: Controlling the timeliness of caches
      • Access-Control-Request-Method
      • Access-Control-Request-Headers
      • User-agent: Client ID. Most browsers have a complex field with subtle differences.
    • Response head:

      • content-type/content-encoding/content-language
      • cache-control
      • Max-age: How many seconds should the client’s local resources be cached and valid when Cache-Control is turned on
      • last-modified
      • etag
      • connection:keep-Alive/close
      • Keep-Alive: timeout = 50, max = 100. Some information needed to keep the connection going.
      • Set-cookie: Set cookie.
      • Acccess-Control-Allow-origin
      • Server: Some information about the server
  • Request/Response Entities:

    • Request entity: The serialized form of the parameter (a = 1 & B = 2), or the form object (Form Date object)
    • Response Entities: Contents that the server needs to pass to the client
  • HTTP request method?

    • Get (get data)
    • Post (Transfer Data)
    • Put (Transfer file, insecure)
    • delete ()
    • Head (Get the header of the message, no entity, to confirm the validity of UTI and the date of resource updates)
    • patch ()
    • Options
    • Connect (Pipelining)
  • What’s the difference between get and post?

    • Get transfers data in url, so the size is limited.
    • Get can be used for sharing;
    • Get caches actively;
  • Status code?

    • 1**: Status; 101 for handover protocol;
    • 2**: Success; 200 – Successful request; 204 – The server successfully processed the request, but did not need to return any entity content;
    • 3**: 301 – Permanent redirection, 302 – Temporary redirection, 304 – Cache success;
    • 4**: Request error; 400 – Request message error; 401 – Request authentication/authentication failure; 403 – No access to resources; 404 – No request resources;
    • 5**: Server error; 500 – Server error; 503 – Service unavailability
  • What’s # behind the URL?

    • Represents a location in a web page;
    • HTTP requests are not included;
    • Change the content after #, the browser scrolls to the appropriate location, and does not reload the page;
    • Changing the content after # will change the browser’s access history.
    • ? And & is the delimiter of the reference;
  • Request body format: content-type set to, application/json; application/x-www-form-urlencoded; multipart/form-data; text/xml;
    xhr.setRequestHeader(‘content-type’, ‘application/x-www-form-urlencoded’);

6. Long Connection, Short Connection, Long Polling, Short Polling

  • Long Connection and Short Connection

    1. The HTTP protocol is based on the request/response mode, so as long as the server responds, the HTTP connection is over.
    2. TCP is a two-way channel, which can be maintained for a period of time without closing, so TCP connection has a real long connection and short connection.
    3. HTTP protocol is the application layer protocol, TCP is the transport layer protocol, only the layer responsible for transmission needs to establish a connection.
    4. HTTP requests and HTTP responses are transmitted back and forth through the TCP connection channel.
    5. Short connection:
      Connections are only maintained during data transmission, requests are initiated, connections are established, and data are returned. The connection is disconnected.
      Suitable for some real-time data requests, with polling to replace old and new data. (rarely used)
    6. Long connection:
      After the connection is initiated, the client maintains the connection with the server before closing the connection. In essence, the communication pipeline is maintained, and then reused, which avoids frequent connection requests and improves efficiency.
    7. How to set up long connections:
      Set Connection: keep-alive in HTTP request header.
      Only in HTTP 1.1 are long connections, and by default long connections are short connections in HTTP 1.0.
      Connection can also be set to close.
    8. The benefits of long connections:
      Multiple HTTP requests can reuse the same TCP connection, saving a lot of TCP connection establishment and disconnection consumption. (The previous request will not be sent until it is returned. HTTP2, like pipeline, sends the request again without receiving the return.)
    9. Long connections are not permanent connections, and if no HTTP requests are made after another timeout, the long connections will be broken.
  • Long polling and short polling

    1. Polling: A mechanism for continuously requesting data in a loop. Polling can be implemented wherever there is a request.
    2. Short polling: In a cycle, requests are continuously initiated, and each request is returned by Loki, which decides whether to use the result based on the comparison of old and new data.
    3. Long polling: In the process of requesting, if the server data is not updated, the connection is suspended until the server pushes new data and then enters the cycle. Suspending long polling requests can lead to waste of resources.
    4. Neither long polling nor short polling is suitable for the case of excessive traffic, because the number of TCP connections per server is limited, and this kind of polling can easily top up the number of connections.
  • The difference between long-short polling and long-short connection:

    1. The way to decide:
      Long and short connections, set in HTTP request header and response header, need to be set on both sides.
      The length of polling depends on the way the server handles it, which has nothing to do with the client.
    2. Ways to achieve:
      Long and short connections, stipulated and implemented by protocol;
      Long and short polling is realized by the server manually suspending requests by programming.

7. HTTP version:

  • HTTP 1.0:

    • Short connection, disconnect TCP connection after sending data once
  • HTTP 1.1:

    • The default is long connections, which can be set with connection: keep-alive/close
    • Server and browser support
    • Long connection, the client maintains the connection with the server before requesting to close the connection.
    • If the HTTP does not send a request after a long connection has expired, then the long connection will be disconnected.
    • Connection: keep-alive Keep-Alive: timeout = 60, which means idle time is 60s.
    • Connection: keep-alive. If you don’t set a timeout, it’s permanent.
    • The default idle time for TCP connections is 2 hours, which is usually set to 30 minutes.
    • HTTP connection retention time is determined by the header connection field and keep-alive field of the server.
  • HTTP 2.0:

    • Head compression. HPACK is used to compress the header and save the network traffic occupied by the header (HTTP 1.1 has a lot of redundant header information)
    • Binary transmission. Using binary format to transmit data (HTTP 1.1 is the 28-year format) brings more advantages and possibilities in protocol parsing and optimization expansion.
    • Multiplexing. Single Long Connection Using Multiplexing
    • Server push. Actively push clients what they may need.
    • Request priority. If the flow is given priority, it will be processed based on that priority, and the server decides how much resources it needs to process the request.
    • HTTP2, which is completely reconstructed on the underlying transmission mechanism, uses Frame, which contains frame-header and frame-data. Each frame header has a stream-ID, and each request/response uses a different stream-ID to achieve multiplexing.
    • Server-push, when the server actively pushes a resource, it will send a frame-type as push-promise frame, which contains the stream-id that push needs to build. When the client receives it, it finds that it is push-promise, and is ready to receive it.
  • HTTP 3.0:

    • https://www.jianshu.com/p/bb3…
    • QUIC (quick UDP Internet connections), based on UDP transport layer protocol, provides the same reliability as TCP.
    • HHTP/2, although different streams are independent of each other, but the data is transmitted and received frame by frame, once a packet is lost, the latter will block. QUIC, based on UDP, enables different streams to transmit independently and without interference.
    • Connection retention when switching networks. Based on TCP protocol, after switching the network, IP will change, so the connection will be disconnected. Based on UDP, different connection identification methods can be built in to restore the connection with the server after the handover.
    • At present, TCP and SSL/TLS (1.0, 1.1, 1.2) require three TCP handshakes + secure handshakes, and 4-5 RRTs for each connection. QUIC implements zero RTT connection.
    • Connect:

      • Client – > server: Send a hello package
      • Server – > client: Security certificate and unique SYN cookie for the corresponding client
      • Client: Decode, save SYN cookie [using a RRT at this time]
      • Client if decoding fails. Lient – > server: Requires the security certificate to be re-sent and SYN cookie to be attached to the request package so that the server can verify the correctness and validity of the request. [At this point, two RTTs are needed to establish the connection. ]
      • Client – > server: Encrypt a Hello Packet and send it. Do not wait for recovery, continue sending data packets.
      • Server: After receiving the Hello package, decode it with your existing secret key. If the decoding is not successful, the client connection will be treated as the first connection and the security certificate will be re-issued. As mentioned above. At this point, there are usually two RTTs, in extreme cases three RTTs.
      • After the server decodes successfully and verifies the security of the client, it can continue to process the next received packets. The delay is 0 RTT.
      • To prevent packet loss, Hello Packet may be retransmitted several times at intervals to ensure that the delay caused by packet loss is reduced. For example, first send a Hello packet, then send a packet, and then send a Hello packet.
      • Elegant packet loss handling:

        • FEC forward error correction:

          • Packet = its own data + part of other data packages.
          • In the case of a small amount of packet loss, redundant data from other data packets can be used to complete data assembly without retransmitting, thus improving data transmission speed.
          • The implementation is similar to RAID5, which establishes a separate packet to send the checksum (XOR) of N packets, so that if one packet is lost in the N packets, it can be recovered directly. In addition, it can also be used to verify the correctness of the package.
        • Key packages are sent multiple times:
        • Quick Session Restart: Supports network handover.
  • Applicable scenarios:

    - Long-distance transmission
    • Collection Network (Wifi Cut 4G)
    • More page resources are required and more links are made in military tactics.
    • Require encrypted transmission
  • 8. HTTPS:

    • Characteristic:

      • Before the request, an SSL connection is established to ensure that all subsequent communications are encrypted and cannot be intercepted easily.
      • Need backend support (backend needs to apply for certificates, etc.)
      • Overhead is greater than http
    • Encryption algorithm:

      • Symmetric encryption:

        • Features: The same key for encryption and decryption.
        • Advantages: Fast speed, suitable for large amounts of data
        • Disadvantage: Synchronized keys are required and security is poor
      • Asymmetric encryption:

        • Features: Public key + private key.
        • Advantages: Safety
        • Disadvantage: Fast speed, suitable for a small amount of data
      • Hash encryption: Mapping binary values of arbitrary length to binary values of fixed length (to hash values). Commonly used to verify data integrity, whether tampered with (md5)
    • Process:

      • Client – > server: Client-supported encryption and hash algorithms.
      • Server – > client: Select the encryption algorithm and hash algorithm supported by the server, as well as the certificate.
      • Client: Verify the validity of the certificate and obtain the public key. Generate a random number R
      • Client – > server: Public key encryption R, R encryption handshake message, hash value of handshake message.
      • Server: The private key decrypts to get R, and the handshake message is decrypted to get the hash value of the handshake message. Compare the two hash values.
      • Server – > client: R encrypts handshake messages, hash values of handshake messages.
      • Client: Decrypt the handshake message, get the hash value, and then compare it. Consistently, subsequent messages are encrypted with this R.

    9. HTTP caching mechanism:

    • Forced Caching:

      • If the browser decides that the local cache is available, it will use it directly and will not initiate http requests.
      • Only consider whether expired, but do not consider whether the server data is updated, resulting in the possibility of not getting the latest data.
    • Negotiation Cache:

      • Send an HTTP request to the server to determine whether the cache is valid, return 304 validly, then use the cache, if not available, return 200 and data, cache and use the new data.
      • When the mandatory cache is unavailable, only use it?
    • Use caching strategies:

      • Frequently changing resources: cache-control: no-cache; etag/last-modified
      • Infrequently changing resources: cache-control: max-age = a large number, adding dynamic characters (hash/version number) to the file name, updating dynamic characters when updating, resulting in the previous compulsory cache invalidation (just not needed)
    • HTTP 1.0:

      • Forced Caching:

        • Expires: relative time, corresponding to the time of the server.
        • Example: Expires: Fri, 30 Oct 1998 14:19:41
      • Negotiation Cache:

        • If-modified-sice: request header.
        • Last-modified: Response header. The last modification event of the file set on the server side.
    • HTTP 1.1:

      • Forced Caching:

        • cache-control:

          • Public: Responses can be cached by clients and proxy servers.
          • Private: Responses can only be cached by clients.
          • The cache expires after max-age = 30:30s and the request needs to be re-sent.
          • After S-MAX age = 30:30s, the cache expires and max-age is overwritten in the proxy server.
          • No-store: no caching
          • No-cache: Cache control is not used.
          • Max-stale = 30: Available within 30 seconds of expiration.
          • Min-fresh = 30:30s to get the latest response.
        • The absolute time is saved in max-age, and the time is calculated by the browser.
        • Progress over HTTP 1.0: Time is for browsers, and it doesn’t matter if browsers and servers don’t match.
      • Negotiation Cache:

        • If-none-match: request header.
        • Etag: Response header. Is a special identifier of a file (usually generated by hash)
        • Progress over HTTP 1.0:

          • Last-modified: the smallest unit is s; load balanced servers may not generate the same time; file content remains unchanged but the change date will cause cache failure.
          • Etag: high accuracy; poor performance (hash value needs to be calculated); higher priority.
    • Caches used by users for different operations:

      • Open the Web page: Find out if there is one on the hard disk.
      • Normal refresh (F5): tab not closed, memory available, no hard disk search.
      • Forced refresh (ctrl + F5): Caching is not applicable.

    10. CDN:

    • CDN = Mirror + Cache + Overall Load Balancing
    • Function: Publish the content of the website to the “edge” of the network closer to the user, and improve the response speed.
    • Cached content: static resources (css, js, pictures, static web pages). Users request dynamic content from the host server and download static data from the CDN)
    • Workflow:

      • The browser requests parsing of the domain name from the local DNS server, and the DNS system will ultimately give the parsing power to the DNS dedicated DNS server that CNAME points to.
      • The DNS server of CDN returns the IP address of the global load balancing device of CDN to the user.
      • Users initiate content URL access requests to CDN’s global load devices.
      • According to the user’s IP address and the user’s request’s push-in URL, the global load device of CDN chooses a regional load balancing device in the user’s area and tells the user to initiate the request to the device.
      • The regional load balancing device will provide services for users to select a suitable cache server. The selection criteria include: judging which server is nearest to the user according to the user’s IP address; judging which server has the user’s required content according to the content name carried in the user’s requested URL; and querying each service. The current load of the server determines which server still has service capability. After comprehensive analysis of these conditions, the regional load balancing device will return the IP address of a cache server to the global load balancing device.
      • The user initiates a request to the cache server, which responds to the user’s request and transmits the user’s required content to the user terminal. If the requested file does not exist in this node, it will go back to the source station to get the file, and then return to the user.
    • Characteristic:

      • “Distributed Storage”: Distribute the content of the central platform to the edge servers everywhere, so that users can get the required content nearby, reduce network time, and improve user access response speed and hit rate. The technology of indexing and caching is used.
      • “Load Balancing”: Schedule access to all requests sent and determine the final actual access address provided to the user.
      • “Content management”: responsible for the supervision of stored content, data analysis and so on.
    • Why use:

      • The bandwidth of the server is limited. If the bandwidth exceeds the limit, the web page will not respond for half a day. CDN can load files through different domain names, which greatly increases the number of concurrent connections to download files.
      • A library file like jquery, if the user’s browser visiting your website has loaded jQuery through the same CDN as yours before visiting other websites, because the file has been cached, there is no need to re-download it.
      • CDN has better availability, lower network latency and packet loss rate.
      • CDN can provide local data centers, and users who are far away from the main server of the website can download files very quickly.
      • Many commercial paid CDNs can provide usage reports, which can complement your own website analysis reports.
      • CDN can distribute load, save bandwidth, improve the performance of the website and reduce the cost of hosting the website, which is usually free.
      • The pursuit of speed in large-scale Web applications does not stop at using only browser caching, because browser caching is always to improve the speed of secondary access. For the acceleration of first access, we need to optimize it from the network level. The most common means is CDN.
    • Disadvantages of CDN:

      • CDN files cannot be loaded during the development phase when the network is disconnected.
      • Not flexible enough. For example, if you only use a small part of the jQuery library, you will not be able to split the files provided on the CDN, or you will have to download the original size, instead of loading faster after splitting.
      • Although some popular CDN files are more likely to be cached beforehand, it is not certain. Some mobile devices may have very small and inefficient caches, so the advantages of CDN are not obvious, especially when you can store smaller files than CDN files on local servers.
      • Due to geographical, legal, policy and commercial barriers, your area may block some popular free CDN service domain names or IP addresses.
      • When the CDN fails, there should be a backup solution, that is, your local file. This redundancy in stable consideration will increase the development workload and complexity.
      • If security is important to your website, don’t use public CDN, because when you request files remotely from CDN, your access source information is sent back, some remote JS files may be modified to collect user or system information, and when you use HTTPS protocol, the CDN you can choose is even more important. Limited.
    • How to use:

      • 1. Deploy static resources to servers on different network lines to speed up traceability of CDN nodes in corresponding networks without caching.
      • 2. When loading static resources, different domain names are used. On the one hand, it is convenient to access the intelligent DNS parsing service for CDN. On the other hand, because static resources and different domains of the main page, HTTP requests for loading resources will not take Cookie and other data in the main page, thus reducing the amount of data transmission, and further adding. Fast network access.

    11. Cross-domain:

    • https://segmentfault.com/a/11…
    • https://segmentfault.com/n/13…
    • https://segmentfault.com/n/13…

    12. Safety;

    • https://segmentfault.com/n/13…
    • https://segmentfault.com/n/13…

    11. Reference:

    • https://www.cnblogs.com/kabi/…
    • https://www.jianshu.com/p/00d…
    • https://www.cnblogs.com/ediso…
    • https://blog.csdn.net/jtracyd…
    • DNS parsing: https://www.jianshu.com/p/827…
    • HTTP hijacking: https://juejin.im/post/59ba14…
    • CDN https://www.cnblogs.com/Ron-Z…
    • CDN https://blog.csdn.net/weixin_…
    • CDN http://www.cnblogs.com/minigr…
    • https://juejin.im/post/59ba14…

    Recommended Today

    Hadoop MapReduce Spark Configuration Item

    Scope of application The configuration items covered in this article are mainly for Hadoop 2.x and Spark 2.x. MapReduce Official documents https://hadoop.apache.org/doc…Lower left corner: mapred-default.xml Examples of configuration items name value description mapreduce.job.reduce.slowstart.completedmaps 0.05 Resource requests for Reduce Task will not be made until the percentage of Map Task completed reaches that value. mapreduce.output.fileoutputformat.compress false […]