HTTP series: http cache

Time:2021-12-4

brief introduction

In order to improve the access speed and efficiency of the website, we need to design a variety of caches, which can avoid unnecessary additional data transmission and requests, so as to improve the request speed of the website. For the HTTP protocol, it has its own HTTP cache.

Today, let’s go into the caching mechanism and usage in http.

Cache type in http

Caching is to save a copy of the requested resources locally, so that the copy can be returned directly at the next request without downloading resources from the server, which reduces the transmission of resources and improves the efficiency.

In addition to directly accessing and returning resources, the cache in HTTP can be divided into two categories. One is shared cache, that is, different clients can obtain resources from the shared cache, and these resources can be accessed by multiple clients. The other is private cache, which means that the cache can only be accessed privately by users or clients, and other users do not have access.

Private caches are easy to understand. The caches in our commonly used browsers are basically private caches. These caches are unique to browsers and will not be shared with other browsers.

Shared cache is mainly used on some web proxies, such as web proxy server, because the web proxy server may provide resource services for many users. For the resources accessed by these users, it is not necessary to save one copy for each user. Only one copy needs to be saved in the web proxy server, which can reduce the invalid copy of resources.

Status of cached responses in http

For the HTTP cache, the general cache is the get request, because the get request has no other redundant parameters except the URI, and its meaning is to obtain resources from the server.

Different get requests will return different status codes.

If the resource is returned successfully, 200 will be returned, indicating OK.

If it is a redirect, 301 is returned. If it is an exception, 404 is returned. If it is an incomplete result, 206 is returned.

Cache control in http

Cache control in HTTP is represented by HTTP headers. Cache control is added in HTTP 1.1. We can control the caching of requests and responses through cache control.

If caching is not required, use:

Cache-Control: no-store

If you need to validate the client’s cache, use:

Cache-Control: no-cache

If you want to force authentication, you can use:

Cache-Control: must-revalidate

In this case, expired resources will not be allowed to be used.

For the server, you can control whether the cache is private or public through cache control:

Cache-Control: private
Cache-Control: public

Another very important cache control is expiration time:

Cache-Control: max-age=31536000

By setting Max age, the expires header can be overwritten, indicating that in this time interval, the resource can be regarded as the latest and does not need to be obtained from the server again.

Cache control is a header field defined in HTTP 1.1. In HTTP 1.0, there is a similar field called pragma. By setting pragma: no cache, you can get an effect similar to cache control: no cache. That is to force the client to resubmit the cache to the server for verification.

However, the server-side response does not contain pragma, so pragma cannot completely replace cache control.

Cache refresh

After the cache is stored on the client, it can be used at the time of request. However, for security reasons, we need to set an expiration time for the cache. The cache is valid only within the time range before the expiration time. If the expiration time is exceeded, it needs to be retrieved from the server.

Such a mechanism can ensure that the resources obtained by the client are always up-to-date. And it can ensure that the update of resources on the server can reach the client in time.

If the client’s resource is expired, the state of the resource is fresh, otherwise the state of the resource is stale.

If the resource is in stale state, it will not be cleaned up from the client immediately. Instead, in the next request, it will send an if none match request to the server to judge whether the resource is still in fresh state on the server. If the resource has not changed, 304 (not modified) will be returned, indicating that the resource is still valid.

The duration of this fresh is determined by “cache control: Max age = n”.

If there is no such header in the response, it will judge whether the expires header exists. If so, the time of fresh can be calculated using expires – date.

If there is no expires header in the response, how to judge the fresh time of the resource?

In this case, you will find the last modified header. If the header exists, the fresh time is (date – last modified) / 10.

revving

In order to improve the efficiency of HTTP requests, we certainly hope that the longer the cache time is, the better. However, as we mentioned earlier, too long the cache time will lead to difficulties in updating server resources. How to solve it?

For files that are not updated frequently, the URL to request them can be determined by the file name + version number. The same version number means that the content of the resource is fixed, and we can cache it for a very long time.

When the server resource content changes, you only need to update the version number when requesting.

Although such an operation will cause the modification of server resources and the version requested by the client, this is not a big problem with the help of modern front-end packaging tools.

Cache Verify

After the cached resources expire, there are two processing methods: one is to re request resources from the server, and the other is to verify the cached resources again.

Of course, re verification requires the support of the server, and the “cache control: must revalidate” request header needs to be set.

So how does the client verify whether the resource is valid? Obviously, we can’t send resources from the client to the server for verification. This operation is too complex, and it will cause a waste of resources in the case of large file requests.

One method we can easily think of is to hash the resource file. Just send the hash operation results for comparison.

Of course, in HTTP, an etags header is provided, which can be regarded as the unique tag of resources for verification on the client and server sides. In this way, the client can request an if none match and let the server judge whether the resource is a match. This judgment is called strong check.

There is also a weak verification method. If the response contains last modified, the client can request an if modified since to ask the server whether the file has changed.

For the server side, it can choose whether to verify the file. If not, it can directly return a 200 OK status code and resources. If verification is performed, a 304 not modified is returned, indicating that the client can continue to use the cached resources. At the same time, some other header fields can be returned, such as updating the expiration time of the cache.

Vary response

When the server responds, you can bring a variety header. The value of this variable header is a key in the response header, such as content encoding, which means caching the resources of an encoding.

For example, the client first requests:

GET /resource HTTP/1.1
Accept-Encoding: * 

Server side return:

HTTP/1.1 200 OK
Content-Encoding: gzip
Vary: Content-Encoding

The resource will be cached together with gzip type content encoding.

When the customer requests again:

GET /resource HTTP/1.1
Accept-Encoding: br

Because the current encoding method of cached resources is gzip, which is different from the encoding method accepted by the client, you need to obtain it from the server again:

HTTP/1.1 200 OK
Content-Encoding: br
Vary: Content-Encoding

At this time, the client caches a resource in BR format.

The next time the client requests a resource of type br again, it can hit the cache.

To sum up, vary means to distinguish and cache resources through other types, such as encoding.

However, this will also cause the problem of repeated storage of resources. Many copies of the same resource are cached because of different coding formats. In order to solve this problem, we need to standardize resource requests.

The so-called standardization is to verify the encoding method of the request before the request, and only select one of the encoding methods for the request, so as to avoid multiple caching of resources.

summary

So far, the introduction of HTTP caching has been completed. You can deepen your understanding of HTTP caching in practical applications.

This article has been included inhttp://www.flydean.com/04-http-cache/

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don’t know are waiting for you to find!

Welcome to my official account: “those things in procedure”, understand technology, know you better!

Recommended Today

Hive built-in function summary

1. Related help operation functions View built-in functions: Show functions; Display function details: desc function ABS; Display function extension information: desc function extended concat; 2. Learn the ultimate mental method of built-in function Step 1: carefully read all the functions of the show functions command to establish an overall understanding and impression Step 2: use […]