CDN accelerated file access in distributed architecture

Time:2021-12-2

1. Introduction
CDN acceleration should be familiar to everyone, at least someone has heard of it. In fact, we should use it virtually every day. Most excellent Internet companies will use it to improve the response speed of websites, such as Alibaba, Tencent, etc. CDN is the abbreviation of content delivery network, which means “content distribution network”, It generally refers to website acceleration or user download resource acceleration

In short, CDN is equivalent to an intermediate agent. Originally, we need to request a website, such as www.baidu.com, and the request will be sent directly to Baidu’s server. If the requester is in Xinjiang, but Baidu’s server is in Beijing, the request and response will be slower affected by the distance, but with CDN, the request is sent to the CDN server closest to the requested IP location first, The server caches some static files on the www.baidu.com page, such as JS, css.html, pictures, etc. in this way, it is relatively close for the initiator of the request to obtain these static resources, so it can play a certain acceleration effect. As for dynamic resources, because they are variable, they cannot be stored on the CDN server through caching, It is still necessary to request the corresponding server to obtain resources through CDN, so CDN acceleration is limited to static resources. The following figure shows the CDN node deployed by a CDN third party in China
CDN accelerated file access in distributed architecture

In the distributed system, CDN can reduce the IO pressure of the server to a certain extent and improve the response speed. Moreover, after using CDN, the user’s request is sent to the CDN server, which can avoid the user’s direct access to the source server, so as to improve the system security to a certain extent and reduce the possibility of being attacked by hackers, similar to protecting agents

However, the cost of setting up CDN is relatively high. Just like the outlets of express companies, if you need to improve service efficiency and quality, you need to set up outlets in all regions of the country, and you need to set up more outlets in densely populated areas to alleviate the pressure of a single outlet. This cost can be said to be very high. Therefore, generally, CDN acceleration is done by a special third-party large company, such as Alibaba, Qiniu cloud, etc. for small companies, the cost of setting up their own is too high. If CDN acceleration is needed, they can directly pay for the services provided by a third party. The price is reasonable and the management is convenient. Generally, the third party will provide detailed use documents and high-quality services
 

  2. Working principle of CDN
CDN network adds a cache layer between users and servers, mainly by taking over DNS to guide users’ requests to the cache to obtain the data of the source server. The steps are as follows:

The user enters the domain name, and the operating system queries the IP address of the domain name from localdns
Localdns queries root DNS for the authorization server of the domain name (here, it is assumed that the localdns cache has expired)
Root DNS responds the domain name authorization DNS record to localdns
After obtaining the authorized DNS record of the domain name, localdns continues to query the IP address of the domain name from the authorized DNS of the domain name
After the domain name authorization DNS queries the domain name record (generally CNAME), it responds to localdns
After obtaining the domain name record, localdns queries the IP address of the domain name from the intelligent scheduling DNS
Intelligent scheduling DNS responds the most suitable CDN node IP address to localdns according to certain algorithms and strategies (such as static topology, capacity, etc.)
Localdns will respond the obtained domain name IP address to the client
After getting the domain name IP address, the user accesses the site server
The CDN node server responds to the request and returns the content to the client. (on the one hand, the cache server saves it locally for future use, and on the other hand, it returns the obtained data to the client to complete the data service process)
  Through the above analysis, we can see that in order to achieve transparent access to ordinary users (the user client does not need to make any settings after using the cache), DNS (domain name resolution) needs to be used to guide users to access the cache server to achieve transparent acceleration services. Because the first step for users to access the website is domain name resolution, Therefore, it is the most simple and effective way to guide users to access by modifying DNS

Elements of CDN network
For ordinary Internet users, each CDN node is equivalent to a web server placed around it. Through the takeover of DNS, the user’s request is transparently directed to the nearest node. The CDN server in the node will respond to the user’s request like the original server of the web site. Because it is closer to the user, the response time must be faster

The one circled by the dotted line in the above figure is the CDN layer, which is located between the client and the site server

Intelligent scheduling DNS (such as 3dns of F5)  
Intelligent scheduling DNS is the key system in CDN service. When users visit the website joining CDN service, the domain name resolution request will finally be handled by “intelligent scheduling DNS”. Through a set of predefined policies, it provides the node address closest to the user at that time to the user, so that the user can get fast service. At the same time, it needs to maintain communication with CDN nodes distributed everywhere, track the health status, capacity and other information of each node, and ensure that users’ requests are allocated to nearby available nodes
Cache function service  
Load balancing equipment (such as big / IP of LVS and F5)  
Content cache server (such as squid)  
Shared storage (whether it is needed depends on the amount of cached data)

  1. Case analysis of CDN intelligent scheduling DNS

3.1 analyze img.alibaba.com domain name
In the system, execute the dig command, and the output is as follows:

dig img.alibaba.com

 
; Partial omission

 
;; QUESTION SECTION:

;img.alibaba.com. IN A

 
;; ANSWER SECTION:

img.alibaba.com. 600 IN CNAME img.alibaba.com.edgesuite.net.

img.alibaba.com.edgesuite.net. 7191 IN CNAME img.alibaba.com.georedirector.akadns.net.

img.alibaba.com.georedirector.akadns.net. 3592 IN CNAME a1366.g.akamai.net.

a1366.g.akamai.net. 12 IN A 204.203.18.145

a1366.g.akamai.net. 12 IN A 204.203.18.160

 
; Partial omission

From the above query results, we can see that the CNAME behind img.alibaba.com.cname img.alibaba.com.edgesuite.net is transferred to the intelligent scheduler by Akamai (CDN service provider)

3.2 analyze the domain name of www.discovery.com
In the system, continue to execute the dig command, and the output is as follows:

dig www.discovery.com

 
; Partial omission

 
;; QUESTION SECTION:

;www.discovery.com. IN A

 
;; ANSWER SECTION:

www.discovery.com. 1077 IN CNAME www.discovery.com.edgesuite.net.

www.discovery.com.edgesuite.net. 21477 IN CNAME a212.g.akamai.net.

a212.g.akamai.net. 20 IN A 204.203.18.154

a212.g.akamai.net. 20 IN A 204.203.18.147

 
; Partial omission

From the above query results, we can see that www.discovery.com. In CNAME www.discovery.com.edgesuite.net. The following CNAME is jumped to the intelligent scheduler by Akamai (CDN service provider)

Summary: Generally speaking, when the website needs to use CDN services, it is generally to transfer the domain name CNAME to the domain name of the CDN service provider. The caching service and scheduling function are completed by the service provider.

  1. Simplified implementation of intelligent scheduling DNS for CDN

4.1. Description of dispatching strategy
When the user requests to resolve the domain name, the Intelligent DNS judges the IP of the user’s localdns, and then matches the IP table range inside the DNS server to see whether the user is a telecom or Netcom user, and then returns the corresponding IP address to the user. The static topology method is used here, only to judge the IP of localdns. If you want to use more complex scheduling algorithms, you can consider commercial products, such as 3dns of F5.

4.2. Hypothetical CDN node planning
Here, we will use the view function of bind to distinguish operators. Suppose we have a CDN node in each operator’s machine room. The list is as follows:

Domain name operator (view) service address
Www.cdntest.com Netcom (CNC) 192.168.0.1
Www.cdntest.com Telecom 192.168.0.2
Www.cdntest.com edu 192.168.0.3
Www.cdntest.com default (any) 192.168.0.4
4.3. Bind view configuration
The following is a partial interception of the named.conf configuration file, which only involves the view. For other details, please refer to the Internet

 
ACL “cnc_iprange” {/ / define the IP range (Netcom)

192.168.1.0/24;

192.168.2.0/24;

//This is only an example, others are omitted

};

 
ACL “tel_iprange” {/ / define IP range (Telecom)

192.168.3.0/24;

192.168.4.0/24;

//Other omissions

};

 
ACL “edu_iprange” {/ / define IP range (Education Network)

192.168.5.0/24;

192.168.6.0/24;

//Other omissions

};

 
ACL “default_iprange” {/ / define IP range (default)

192.168.7.0/24;

192.168.8.0/24;

//Other omissions

};

 
 
view “CNC” {

Match-clients{cnc_iprange};

zone “.” IN {

type hint;

file “named.root”;

};

 
zone “localhost” IN {

type master;

file “localhost.zone”;

allow-update { none; };

};

 
zone “cdntest.com” IN {

type master;

file “cnc_cdntest.zone”;

};

};

 
view “TEL” {

Match-clients{tel_iprange};

zone “.” IN {

type hint;

file “named.root”;

};

 
zone “localhost” IN {

type master;

file “localhost.zone”;

allow-update { none; };

};

 
zone “cdntest.com” IN {

type master;

file “tel_cdntest.zone”;

};

};

 
view “EDU” {

Match-clients{edu_iprange};

zone “.” IN {

type hint;

file “named.root”;

};

 
zone “localhost” IN {

type master;

file “localhost.zone”;

allow-update { none; };

};

 
zone “cdntest.com” IN {

type master;

file “edu_cdntest.zone”;

};

};

 
view “DEFAULT” {

Match-clients{default_iprange};

zone “.” IN {

type hint;

file “named.root”;

};

 
zone “localhost” IN {

type master;

file “localhost.zone”;

allow-update { none; };

};

 
zone “cdntest.com” IN {

type master;

file “default_cdntest.zone”;

};

};

Configuration description of zone file

In the four zone configuration files (cnc_cdntest.zone, tel_cdntest.zone, edu_cdntest.zone, default_cdntest. Zone), only the a record of www.cndtest.com is different, and the others are the same

Domain name zone profile a record address
www.cdntest.com cnc_cdntest.zone 192.168.0.1
www.cdntest.com tel_cdntest.zone 192.168.0.2
www.cdntest.com edu_cdntest.zone 192.168.0.3
www.cdntest.com default_cdntest.zone 192.168.0.4
Only the a record address of www.cdntest.com is listed above. For other zone syntax, please refer to the Internet

Brief description of domain name resolution process

The user queries the domain name www.cdntest.com from localdns
Localdns queries the authorized DNS at www.cdntest.com
Authorized DNS judges the IP address of localdns used by the user and matches the IP range set above. If the range is Netcom, it will respond the IP address corresponding to Netcom (192.168.0.1) to localdns (others, and so on)
Localdns will respond the obtained domain name IP address to the client (domain name resolution is completed)
Note: in this process, we simplified the primary DNS   reach   Intelligent DNS   CNAME process between (to briefly illustrate the problem)  
The static topology (based on IP range) method, also known as regionalization method, is used here to judge the IP address of localdns

Problems in this simplified scheme

If the user sets the wrong DNS, the user access may be slower than the original (for example, Netcom users set the DNS of Telecom)
Unable to determine the health status and capacity status of the CDN node server, users may be directed to unavailable CDN nodes
Due to the static topology method, there may be CDN nodes accessed by users that are not optimal and fastest
… there may be other unexpected

  1. Summary

When building a CDN network, the key is   Intelligent scheduling DNS is the overall coordination of CND network. Through efficient scheduling algorithm, users can get the best access experience
The second is the management of CND nodes, such as content synchronization mechanism, profile update and so on
Of course, in large websites, we should also consider the cost and rate of return of building CDN system