Front end 123: how browser caching works


Workflow for browser caching

Obtaining content through the network is slow and expensive. Large responses require multiple round-trip communications between the client and the server, which will delay the time for the browser to obtain and process content, and increase the traffic cost of visitors. Therefore, the ability to cache and reuse previously acquired resources becomes a key aspect of performance optimization.

Let’s take a look at the devtools network diagram you are most familiar with:

Front end 123: how browser caching works

The parts circled in cyan, green and orange in the figure are the data obtained from memory (memory cache), disk (disk cache) and HTTP request (non cache). There is also a request with return code 304 to obtain data from cache (memory / disk). The difference between 304 and memory / disk cache is that when the browser judges that the resource has expired, it will go to the server to query whether the resource is updated. If the resource has not been updated, it will return 304 code. When the browser receives 304 code, it will update the expiration time of the resource and directly get the current resource from the previous disk / memory cache. In other words, if the resource has not expired, Then the browser will skip the step of verifying resources from the server and directly get the memory / disk cache.

Front end 123: how browser caching works

The general process is as follows:

  • 1) First, check whether the service worker cache exists. If it is missed or does not exist, proceed to the next step
  • 2) Check whether there are resources in the memory, and load them directly if they exist (from memory – 200).
  • 3) If there is no memory, choose to obtain it from the hard disk. If it exists and does not expire, load it directly (from disk – 200). If it expires, directly send a request to the server to obtain resources. If the resource is not updated, the server returns 304, and the browser obtains the resource from the hard disk cache and updates itExpiration time / Etag / last modified。 If the resource is updated, obtain the latest resource, return the resource through HTTP request, re cache the resource and update itExpiration time / Etag / last modified
  • 4) If there is no hard disk, an HTTP network request will be sent to the back end.
  • 5) The loaded resources are cached to the hard disk and memory, and the resources are updatedExpiration time / Etag / last modified

     Service Worker CacheIt has higher priority, more complex data control and the highest degree of freedom of operation;Memory CacheMore emphasis on a cache storage method and browser memory cache strategy;HTTP Cachebe relative toMemory CacheAccording to different storage methods, it can also be calledDisk Cache, it depends on the whole HTTP cache verification process (strong cache and negotiation cache), and finally determines when to read from the cache and when to update resources from the server through verification; Push cache has few data and is not widely used. I will only introduce it for the time being.

Service worker cache (highest priority)

Service worker is an independent thread running behind the browser, which can generally be used to implement the caching function. If service worker is used, the transport protocol must be HTTPS. Because request interception is involved in service worker, HTTPS protocol must be used to ensure security. The cache of service worker is different from other built-in caching mechanisms in the browser. It allows us to freely control which files are cached, how to match the cache, how to read the cache, and the cache is persistent.

The caching function of service worker is generally divided into three steps: first, register the service worker, and then listen to the install event to cache the required files. Then the next time the user accesses, you can query whether there is a cache by intercepting the request. If there is a cache, you can directly read the cache file, otherwise you can request data.

When the service worker does not hit the cache, we need to call the fetch function to obtain data. That is, if we do not hit the cache in the service worker, we will find the data according to the cache search priority. However, no matter whether we get the data from the memory cache or from the network request, the browser will display the content we get from the service worker.

Memory cache (second priority)

Memory cache is the cache in memory, which mainly contains the resources obtained in the current page, such as downloaded styles, scripts, pictures, etc. Reading the data in memory is certainly faster than that on disk. Although the memory cache is efficient, the cache duration is very short and will be released as the process is released. Once we close the tab page, the in memory cache is released. When caching resources, the memory cache does not care about the cache control value of the HTTP response header of the returned resources. In other words, this is a caching method that strongly depends on the browser’s local memory management strategy, and the processing methods of memory cache are slightly different from each browser.

Memory cache follows these policies:

  • For large files, the probability is not stored in memory, and vice versa
  • If the current system memory utilization is high, files will be stored in the hard disk first

*HTTP cache (second priority)

     HTTP cache is divided into two types according to the working modeStrong cacheandNegotiation cache, the browser will first judgeStrong cacheWhether it is hit. If the hit fails, it will be attemptedNegotiation cache

Front end 123: how browser caching works

1) Strong cache

  • >HTTP 1.0 era – expires

     When we obtain the remote resources of the server through the browser, the server requests the response headers to return a message through HTTPexpiresTimestamp field (blue part in the figure above), for exampleexpires: Wed, 13 Oct 2021 22:15:05 GMT, indicating that the expiration time of this resource is Greenwich mean timeWednesday, October 13, 2021 22:15:05(Beijing time + 8h = Greenwich mean time), if the browser judges that the current time is before the resource expiration time, it will read the resource from the cache (if it exists in the cache), otherwise it will send the request to the server again.
     The working mechanism of expires requires that the error between the client time and the server time is small, otherwise the cache update policy may not take effect in a short time.

  • >HTTP 1.1 era – cache control

     cache-control: max-ageThe method is also implemented through the corresponding fields in the response headers carried by the server when returning resources, such as:cache-control: max-age=31536000Indicates that the resource expires 3153600 seconds after the browser receives the resource. AndexpiresThe returned timestamps are different. In order to avoid time error, cache control directly returns a time length, and the browser can make accurate judgment according to a local time difference.
cache-controlOther relevant fields include:
     i.public/private: in large architectures that rely on various agents, we have to consider the caching of proxy servers. Public and private are used to control whether the proxy service cache can cache resources. If we set public for the resource, it can be cached by both the browser and the proxy server; If the browser is set to private, we can only cache the resource. Private is the default, but proxy caching can also take effect when only s-maxage is set.
     ii.s-maxage: for the caching problem of the proxy server, this field is used to indicate the effective time of the cache on the cache server (such as cache CDN). It is only valid for the public cache,cache-control: max-age=3600, s-maxage=31536000 after setting no cache for a resource, each request will not ask the browser about the cache, but directly ask the server to confirm whether the resource has expiredNegotiation cache no caching policy is used. Each request is directly obtained from the server, and no resource caching is performed on the browser client.

  • >Cache control and expires coexist

Expires has a higher priority. When cache control and expires occur at the same time, cache control shall prevail. However, considering downward compatibility, you can choose to use both caching strategies at the same time.

2) Negotiation cache

The negotiation cache depends on the communication between the server and the browser. When the resource is obtained for the first time, the browser will store the response headers field of the HTTP request: last modified / Etag. When the strong cache misses, its value is used as the flag bit carried by the browser and the server during communication to judge whether the resource is expired, If the server judges that the resource is expired, it will download the resource again and update the corresponding flag bit. If it is judged that the resource is not updated, it will return 304 status code, and the browser will reuse the client cache resources.

  • >Last modified and if modified since modes

     Last-ModifiedThe timestamp flag returned with the server-side HTTP response header indicates the last time a resource was updated. The request headers field is added when the client requests the resourceIf-Modified-Since(the value is the same as last modified) used by the server to verify whether the resource is updated,Last-Modified: Wed, 13 Jan 2021 15:34:55 GMT
Using last modified has some disadvantages:
     i. Hit error 1:When we update a resource file on the server, but its actual content has not changed, the corresponding resource update timestamp will change. When the server file has not changed on the browser side, only judging by the timestamp will also lead to the complete re download of the resource.
     II. Hit error 2:If modified since can only detect the time difference with seconds as the minimum measurement unit, and can not perceive the file changes within 1s, which will lead to some browser cache updates not in time.

  • >Etag and if none match modes

     EtagJust to make up for itLast-ModifiedThe disadvantages of the new negotiation cache method. Etag is the unique flag of the resource returned with the server-side HTTP request header, for example:ETag: W/"2a3b-1602480f459", it is generated according to the resource content and can accurately perceive the changes of resources. Even if it is updated many times, the Etag value will not change as long as the content remains unchanged. The next time the browser requests this resource, a request header with the same value namedif-None-MatchThe field of is used by the server to compare this resource,If-None-Match: W/"2a3b-1602480f459"

  • > EtagBetter than in perceiving file changesLast-ModifiedMore accurate and higher priority, butEtagThe generation of will consume the performance of some servers. It can be used in conjunction with the former as an auxiliary negotiation cache. WhenEtagandLast-ModifiedWhen both exist, useEtagSubject to.

Push cache (lowest priority)

Push cache refers to the cache of http2 in the server push phase:

  • Push cache is the last line of defense of cache. The browser will ask for push cache only when memory cache, HTTP cache and service worker cache are all missed.
  • Push cache is a kind of cache that exists in the session stage. When the session terminates, the cache will be released.
  • As long as different pages share the same http2 connection, they can share the same push cache.

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]