Practice Web caching with code

Time:2021-5-4

Web caching is an HTTP device that can automatically save copies of common documents. When a web request arrives in the cache, if there is a “cached copy” locally, the document can be extracted from the local storage device instead of the original server.

The above is the definition of Web caching in HTTP authority guide. The main advantages of caching are as follows:

  1. The transmission of redundant data is reduced;
  2. It reduces the network request of client and the pressure of original server;
  3. The delay is reduced and the page load is faster.

To sum up, it is to save traffic, bandwidth and speed. So how does caching work? How does the client and server coordinate the timeliness of caching? Next, we use the code to reveal the working principle of caching step by step.

1、 Browser cache

When we type in the browser address barlocalhost:8080/test.txtAnd enter, we are to the specified server to initiate atext.txtFile request,

After receiving the request, the server finds the file and prepares to return it to the client, and sets theCache-ControlandExpiresTworesponse headerTell the client that this file should be cached. Don’t ask me for it before it expires.

First, let’s look at the project directory

|-- Cache
    |-- index.js
    |-- assets
        |-- index.html
        |-- test.txt

The specific implementation code is as follows:

<!-- index.html -->
...
<a href="./test.txt">test.txt</a>
...
// index.js
const http = require('http');
const path = require('path');
const fs = require('fs');

http.createServer((req, res) => {
    const requestUrl = path.join(__dirname, '/assets', path.normalize(req.url));
    fs.stat(requestUrl, (err, stats) => {
        if (err || !stats.isFile) {
            res.writeHead(404, 'Not Found');
            res.end();
        } else {
            const readStream = fs.createReadStream(requestUrl);
            const maxAge = 10;
            const expireDate = new Date(
                new Date().getTime() + maxAge * 1000 
            ).toUTCString();
            res.setHeader('Cache-Control', `max-age=${maxAge}, public`);
            res.setHeader('Expires', expireDate);
            readStream.pipe(res);
        }
    });
}).listen(8080);

thatCache-ControlandExpiresWhat do these two response headers mean?Cache-Control:max-age=500Indicates that the maximum period of cache storage is set to 500 seconds, beyond which the cache is considered expired.Expires:Tue, 23 Feb 2021 01:23:48 GMTMeans inTue, 23 Feb 2021 01:23:48 GMTThe document will expire after this date.

After starting the server, access thelocalhost:8080/index.html, this is the first time to access, there is no cache, so the server returns the complete resource.

Practice Web caching with code

We click on the hyperlink to visittest.txt

Practice Web caching with code

Because it is the first time to access, so there is no cache. At this time, we click the back button to return toindex.html

Practice Web caching with code

Did you find any difference? At this time, size in network has becomedisk cache, which means that the browser cache, that is, strong cache, has been hit. At this time, click the hyperlink to accesstest.txtIf the expiration time is less than 10s, the browser cache will be hit. If it is more than 10s, the resources will be retrieved from the server.

Here, the browser’s forward and backward buttons will always read resources from the cache, ignoring the set cache rules. That is to say, if I startedlocalhost:8080/test.txtThe page returns to thelocalhost:8080/index.htmlPage, you will find that no matter how long it will bedisk cache, and then click the browser forward button to enterlocalhost:8080/test.txtPage, even if it exceeds the set expiration time, is still from disk cache.

be carefulCache-ControlPriority of is greater thanExpiresBecause of the time difference, the server time and the client time may be inconsistent, which will lead toExpiresIt is not accurate to judge the cache validity. howeverExpiresCompatible with http1.0,Cache-ControlIt is compatible with HTTP1.1, so generally both are set.

2、 Negotiation cache

After setting the cache time limit, what should we do if the cache is expired? You may say that if it’s overdue, get resources from the server again. But it is also possible that the cache time has expired, but the resources have not changed, so we have to introduce other strategies to deal with this situation, that is, negotiation cache, that is, weak cache.

Let’s sort out the process of negotiation caching

Practice Web caching with code

When the server returns the resource for the first time, in addition to settingCache-ControlandExpiresIn addition to the response header, theLast-Modified(resource update time) andETag(resource summary or resource version) two response headers, representing the latest change time and entity label of the resource respectively. When the client fails to hit the strong cache, it will initiate a request again like the server and carry theIf-modified-SinceandIf-None-MatchTwo request headers. If the server gets these two request headers, they will be the same as those set beforeLast-ModifiedandETagFor comparison, if there is no match, it indicates that the cache is not available and resources are returned. Otherwise, it indicates that the cache is effective and resources are returned304Response code to inform the cache that it can continue to be used and update the cache effective time.

Now let’s take a look at the specific code implementation:

const http = require('http');
const path = require('path');
const fs = require('fs');
const crypto = require('crypto');

//Generate entity Digest
function generateDigest(requestUrl) {
    let hash = '2jmj7l5rSw0yVb/vlWAYkK/YBwk';
    let len = 0;
    fs.readFile(requestUrl, (err, data) => {
        if (err) {
            console.error(error);
            throw new Error(err);
        } else {
            len = Buffer.byteLength(data, 'utf8');
            hash = crypto
                .createHash('sha1')
                .update(data, 'utf-8')
                .digest('base64')
                .substring(0, 27);
        }
    });
    return '"' + len.toString(16) + '-' + hash + '"';
}

//Response file
function responseFile(requestUrl, stats, res) {
    const readStream = fs.createReadStream(requestUrl);
    const maxAge = 10;
    const expireDate = new Date(
        new Date().getTime() + maxAge * 1000
    ).toUTCString();
    res.setHeader('Cache-Control', `max-age=${maxAge}, public`);
    res.setHeader('Expires', expireDate);
    res.setHeader('Last-Modified', stats.mtime);
    res.setHeader('ETag', generateDigest(requestUrl));
    readStream.pipe(res);
}

//Judge freshness
function isFresh(requestUrl, stats, req) {
    const ifModifiedSince = req.headers['if-modified-since'];
    const ifNoneMatch = req.headers['if-none-match'];

    if (!ifModifiedSince && !ifNoneMatch) {
        //If there is no corresponding request header, a new resource should be returned
        return false;
    } else if (ifNoneMatch && ifNoneMatch !== generateDigest(requestUrl)) {
        //If Etag does not match (resource content changes), the cache is not fresh
        return false;
    } else if (ifModifiedSince && ifModifiedSince !== stats.mtime.toString()) {
        //If the resource update times do not match, the cache is not fresh
        return false;
    }
    return true;
}

http.createServer((req, res) => {
    const requestUrl = path.join(__dirname, '/assets', path.normalize(req.url));

    fs.stat(requestUrl, (err, stats) => {
        if (err || !stats.isFile) {
            res.writeHead(404, 'Not Found');
            res.end();
        } else {
            if (isFresh(requestUrl, stats, req)) {
                //Cache freshness tells the client that no cache is available and does not return the response entity
                res.writeHead(304, 'Not Modified');
                res.end();
            } else {
                //The cache is not fresh, return the resource again
                responseFile(requestUrl, stats, res);
            }
        }
    });
}).listen(8080);

You can see from the codeETagandLast-ModifiedThey are used to check the negotiation cache,ETagBased on entity label, it can be specified by version number or resource summary;Last-ModifiedIs based on the last modification time of the resource.

This is the time to visitlocalhost:8080/test.txtFile, when hit the strong cache, wait 10 seconds, access again, and the server returns304Instead of200Indicates that the negotiation cache is in effect.

Practice Web caching with code

At this point, modify the test.txt file, access it again, and the server returns200, showing the latesttest.txtThe content of the document.

Practice Web caching with code

To sum up:

  1. ETagIt can judge whether the resources have changed or not more accurately, and the priority is higher than thatLast-Modified
  2. Summary based implementationETagIt is relatively slow and takes up more resources;
  3. Last-ModifiedAccurate to seconds, it can’t judge the cache freshness of sub second level resource updates;
  4. ETagCompatible tohttp1.1Last-ModifiedCompatible tohttp1.0

Note: This article uses hyperlinks to accesstest.txtBecause if you access the resource directly in the address bar, the browser willrequest headersSet incache-control:max-age=0, which will never hit the browser cache.

Test browser: Chrome version 88.0.4324.192

reference resources:

  1. HTTP authority Guide
  2. HTTP cache
  3. etag

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]