Do you know these knowledge points of browser cache?

Time:2021-3-4

1、 Basic knowledge of browser cache

It is divided into strong cache and negotiation cache

1. When the browser loads a resource, it first determines whether it hits the strong cache according to some HTTP headers of the resource. If the strong cache hits, the browser reads the resource directly from its own cache and will not send a request to the server. For example, if the cache configuration of a CSS file hits the strong cache when the browser loads its web page, the browser loads the CSS file directly from the cache, and even the request will not be sent to the server where the web page is located.

2. When the strong cache fails to hit, the browser will send a request to the server, through the server according to the other HTTP address of the resource The header verifies whether the resource hits the negotiation cache. If the negotiation cache hits, the server will return the request, but will not return the data of the resource. Instead, it tells the client that the resource can be loaded directly from the cache, so the browser will load the resource from its own cache.

The common point of strong cache and negotiation cache is that if hit, resources are loaded from client cache, not from server; the difference is that strong cache does not send requests to server, negotiation cache will send requests to server.

When the negotiation cache also fails to hit, the browser loads the resource data directly from the server.

2、 The principle of strong cache

2.1 introduction

When the browser’s request for a resource hits the strong cache, the returned HTTP status is 200. In the chrome developer’s network, the size will be displayed as from cache. For example, there are many static resources in JD’s home page that are configured with strong cache. Open it several times with chrome, and then view the network with F12. You can see that many requests are loaded from the cache

Do you know these knowledge points of browser cache?

Strong caching is implemented by using two HTTP response headers, expires or cache control, which are used to indicate the validity period of resources in the client cache.

Expires is a header proposed by http1.0 to represent the expiration time of resources. It describes an absolute time, which is returned by the server and represented by a string in GMT format Expires:Thu , 31 Dec 2037 23:55:55 GMT

2.2 principles of expires caching

1. For the first time, the browser requests a resource from the server. When the server returns the resource, it adds expires to the response header, such as

Do you know these knowledge points of browser cache?

2. After the browser receives the resource, it will cache the resource together with all the response headers (so the header returned by the cache hit request is not from the server, but from the previously cached header)

3. When the browser requests this resource again, it first searches from the cache. After finding this resource, it takes out its expires and compares it with the current request time. If the request time is before the time specified by expires, it can hit the cache, otherwise it can’t

4. If the cache fails to hit, when the browser loads resources directly from the server, the expires header will be updated when it is reloaded

Expires is an old strong cache management header. Because it is an absolute time returned by the server, when the time difference between the server and the client is large, cache management is prone to problems. For example, modifying the client time at will can affect the cache hit result. Therefore, in HTTP1.1, a new header, cache control, is proposed. This is a relative time. When configuring the cache, it is expressed in seconds by numerical value, such as cache- Control:max-age=315360000

2.3 cache control principle

1. The browser requests a resource with the server for the first time. When the server returns the resource, it adds cache control to the response header

Do you know these knowledge points of browser cache?

2. After the browser receives the resource, it will cache the resource together with all the response headers

3. When the browser requests the resource again, it first searches from the cache. After finding the resource, it calculates a resource expiration time according to its first request time and the validity period set by cache control, and then compares the expiration time with the current request time. If the request time is before the expiration time, it can hit the cache, otherwise it can’t

4. If the cache fails to hit, the cache control header will be updated when the browser loads resources directly from the server

Cache control describes a relative time. When a cache hit is made, the client time is used for judgment. Therefore, compared with expires, cache control is more effective and safer in cache management.

Only one or both of these two headers can be enabled. When both expires and cache control exist in response header, cache control has higher priority than expires

Do you know these knowledge points of browser cache?

=

3、 Management of strong cache

The principle of strong caching is introduced in the front. In practical applications, we will encounter scenes that need strong caching and scenes that don’t need strong caching. There are usually two ways to set whether to enable strong caching

1. Through the way of code, add expires and cache control header in the response returned by the web server

2. By configuring the web server, the web server can add expires and cache control header when responding to resources

For example, in JavaWeb, we can use code to set strong cache

You can also set that strong caching is not enabled through Java code

Nginx and Apache, as professional web servers, both have special configuration files, which can configure expires and cache control. If you are interested in operation and maintenance, you can find many related articles on Baidu by searching for nginx setting expires cache control or Apache setting expires cache control.

Because in the development time will not specially configure the strong cache, and the browser will cache images, CSS and JS and other static resources by default, so in the development environment, the resources are often not updated in time because of the strong cache, and the latest effect can not be seen. There are many ways to solve this problem, including the following

Dealing with cache problems

1. Direct Ctrl + F5, this method can solve the problem of updating resources directly referenced by the page

2. Development of privacy mode using browser

3. If you use chrome, you can disable the cache in the network with F12 (this is a very effective method)

Do you know these knowledge points of browser cache?

4. In the development phase, add a dynamic parameter to the resource, such as CSS/ index.css?v=0 . 0001. Because every resource modification needs to update the reference location and modify the parameter values, the operation is not very convenient, unless you are developing in dynamic pages such as JSP, you can use server variables (v = ${sysrnd}), or you can use some front-end builders To deal with the problem of parameter modification

5. If the page referenced by the resource is embedded in an iframe, you can right-click in the iframe area to reload the page, taking chrome as an example

Do you know these knowledge points of browser cache?

6. If the cache problem occurs in the Ajax request, the most effective solution is to add a random number to the Ajax request address

7. Another situation is that when setting the SRC of iframe dynamically, the latest effect may not be seen due to the cache problem. At this time, adding a random number after the SRC to be set can also solve the problem

8. If you use the front-end tools such as grunt, gulp and webpack to start a static server through their plug-ins such as grunt contrib connect, you don’t have to worry about the resource update problem in the development phase, because the cache control is always set to no cache in the response header returned by all resources in the static server

Do you know these knowledge points of browser cache?

4、 Application of strong cache

Strong cache is one of the most powerful tools for front-end performance optimization. For pages with a large number of static resources, we must use strong cache to improve response speed. The usual way is to configure an expired or cache control for all of these static resources, so that when a user visits a web page, he will only request static resources from the server when loading for the first time. In other times, as long as the cache is not invalid and the user does not force a refresh, he will load it from his own cache, such as jd.com Page cache resources, its cache expiration time is set to 2026

Do you know these knowledge points of browser cache?

However, this kind of cache configuration will bring a new problem, that is, the problem of resource update when publishing. For example, a picture has been cached on the user’s computer when the user visits the first version. When the website publishes a new version and replaces the picture, the user who has visited the first version will not be able to access it by default because of the cache setting Will request the latest image resources of the server, unless it clears or disable the cache or force refresh, otherwise it will not see the latest image effect

All the things mentioned in this article belong to theoretical solutions, but now there are many front-end tools that can actually solve this problem. Because each tool involves a lot of content details, there is no way to introduce them one by one. For those who are interested, you can learn about grunt, gulp, webpack, FIS and EDP. Based on these tools, this problem can be solved, especially FIS and EDP are front-end development platforms launched by Baidu. There are ready-made documents for reference

http://fis.baidu.com/fis3/api…

http://ecomfe.github.io/edp/d…

Another thing to note about strong caching is that it is usually used for static resources. Dynamic resources need to be used with caution. In addition to server-side pages can be regarded as dynamic resources, those HTML referencing static resources can also be regarded as dynamic resources. If such html is also cached, after these HTML updates, there may be no mechanism to inform the browser that these HTML has been updated, especially In the front-end and back-end separated application, the pages are pure HTML pages, and each access address may directly access the HTML pages. These pages usually do not strengthen the cache to ensure that the browser always requests the latest resources of the server when accessing these pages

=

5、 The principle of negotiation cache

5.1 introduction

When the browser’s request for a resource fails to hit the strong cache, it will send a request to the server to verify whether the negotiation cache is hit. If the negotiation cache is hit, the HTTP status returned by the request response is 304 and a not will be displayed For example, if you open the home page of jd.com, press F12 to open the developer tool, and then press F5 to refresh the page and view the network, you can see that many requests hit the negotiation cache

Do you know these knowledge points of browser cache?

If you view the response header of a single request, you can also see the status code of 304 and the string of not modified. If you see this, it means that the resource hit the negotiation cache and then loaded from the client cache, rather than the latest resource of the server

Do you know these knowledge points of browser cache?

5.2 last modified, if modified since control negotiation cache

1. The browser requests a resource with the server for the first time. When the server returns the resource, it adds the last modified header to the response header, which indicates the last modification time of the resource on the server

Do you know these knowledge points of browser cache?

2. When the browser requests the resource with the server again, it adds the if modified since header to the request header. The value of this header is the last modified value returned in the last request

Do you know these knowledge points of browser cache?

3. When the server receives the resource request again, it judges whether the resource has changed according to the if modified since sent by the browser and the last modification time of the resource on the server. If there is no change, it returns 304 not modified, but it will not return the resource content. If there is a change, it will return the resource content normally. When the server returns 304 not modified response, the last modified header will not be added to the response header, because since the resource has not changed, the last modified will not change. This is the response header when the server returns 304

Do you know these knowledge points of browser cache?

4. After the browser receives the 304 response, it loads the resource from the cache

5. If the negotiation cache fails to hit, when the browser loads the resource directly from the server, the last modified header will be updated when it is reloaded. If modified since will enable the last modified value when it is requested next time

[last modified, if modified since] are all headers returned according to the server time. Generally speaking, without adjusting the server time and tampering with the client cache, it is very reliable for these two headers to manage the negotiation cache together. However, sometimes the resources on the server actually change, but the last modification time does not change. This is not the case This kind of problem is not easy to locate, and when this happens, it will affect the reliability of the negotiation cache. So there is another pair of headers to manage the negotiation cache, which are Etag and if none match. The way they manage their cache is

5.3 Etag and if none match control negotiation cache

1. The browser requests a resource with the server for the first time. When the server returns the resource, it adds the Etag header to the response header. The header is a unique identifier generated by the server according to the currently requested resource. The unique identifier is a string. As long as the resource changes, the string will be different. It has nothing to do with the last modification time, so it can be used easily OK, last modified

Do you know these knowledge points of browser cache?

2. When the browser requests the resource with the server again, add the if none match header to the request header. The value of the header is the Etag value returned in the last request

Do you know these knowledge points of browser cache?

3. When the server receives the resource request again, it will send if none match and then generate a new Etag according to the resource. If the two values are the same, it means that the resource has not changed, otherwise it means that it has changed. If there is no change, it will return 304 not modified, but it will not return the resource content. If there is change, it will return the resource content normally. Unlike last modified, when the server returns a 304 not modified response, the response header will return the Etag because the Etag has been regenerated, even if the Etag has not changed from the previous one

Do you know these knowledge points of browser cache?

4. After the browser receives 304’s response, it loads resources from the cache.

6、 Management of negotiation cache

Negotiation cache is different from strong cache. Strong cache does not send requests to the server, so sometimes the browser does not know when the resource is updated, but negotiation cache will send requests to the server, so the server must know whether the resource is updated or not. Most web servers turn on negotiation caching by default, and enable [last modified, if modified since] and [Etag, if none match] at the same time

Do you know these knowledge points of browser cache?

If there is no negotiation cache, every request to the server will have to return the resource content, so the performance of the server will be extremely poor.

[last modified, if modified since] and [Etag, if none match] are generally enabled at the same time to deal with the unreliable last modified.

There is a scenario that needs attention

In a distributed system, the last modified files of multiple machines must be consistent, so as to avoid the comparison failure caused by load balancing to different machines;

Try to shut down Etag in distributed system (Etag generated by each machine will be different);

For resource requests on the JD page, the returned replies header is only last modified, and there is no Etag

Do you know these knowledge points of browser cache?

Negotiation caching needs to be used in conjunction with strong caching. In the previous screenshot, besides the last modified header, there are also related headers of strong caching, because negotiation caching is meaningless if strong caching is not enabled

=

7、 The impact of related browser behavior on cache

If the resource has been cached by the browser, before the cache fails, when requesting again, it will check whether it hits the strong cache by default. If the strong cache hits, it will read the cache directly. If the strong cache does not hit, it will send a request to the server to check whether it hits the negotiation cache. If the negotiation cache hits, it will tell the browser that it can still read from the cache, otherwise it will read from the server The server returns the latest resource. This is the default processing mode, which may be changed by the behavior of the browser

1. When the page is forced to refresh by Ctrl + F5, it is loaded directly from the server, skipping the strong cache and negotiation cache;

2. When F5 refreshes the page, the strong cache is skipped, but the negotiation cache is checked

Source:http://blog.poetries.top/2019…

Do you know these knowledge points of browser cache?