Apache simple page caching

Time:2020-1-1

Maintain a set of website system, because of a long time, I don’t know how to open and operate its static system, and because there are too many queries on the home page and some channel pages, the operation is slow, so a simple and effective method is urgently needed to improve the access efficiency.

Because the main problem lies in the query of the content in the page, the optimization method is naturally to reduce or eliminate the query. On the premise of not moving the website system code, the purpose can be achieved through Apache’s UrlRewrite.

Create a directory (such as cache) to store the static page file, and write rules in. Htaccess of the website root directory. If not, create a directory

RewriteCond %{DOCUMENT_ROOT}/cache/%{REQUEST_URI}.cache -f
RewriteRule ^(.*)$ cache/$1.cache [L]

This rule determines whether there is a file with the. Cache suffix corresponding to the current URI in the cache directory. If there is one, the request will be directed to the file. However, the URL of the website page itself has been mapped to the program script with urlrefresh, and its path rule may not be sensitive to the “/” at the end, for example, / ABC / def and / ABC / def / can be the same page, rewritecond does not support characters For string truncation, regular replacement and other operations, the following rules are added to match the cache at the end of ‘/’:

RewriteCond %{DOCUMENT_ROOT}/cache/%{REQUEST_URI}/.cache -f
RewriteRule ^(.*)$ cache/$1/.cache [L]

Now, we also need a simple script to store the page output in the cache file. You can write a bash script to achieve this:

#!/bin/bash

WD=$(cd `dirname $0`; pwd)

cache ()
{
    ln="$1"
    fn="$WD/$1.cache"
    tn="$WD/$1.cachx"
    dn=`dirname $fn`

    rm    -f "$fn"
    rm    -f "$tn"
    mkdir -p "$dn"
    wget  -O "$tn" "http://www.xxx.com/$ln"
    mv "$tn" "$fn"
}

if [ "@" != "$1" ]
then
    cache   "$1"
else
    cache ""
    cache xxx/
    cache xxx/xxx
fi

It can be seen that WGet saves the output into a temporary file and then changes its name to the target file. This is because WGet will open the file at the beginning of execution. At this time, the request starts, and Apache redirects due to the existence of the file. The file obtained finally becomes empty. To reduce the network request time, 127.0.0.1 www.xxx.com can be added to / etc / hosts

This script can be used as follows:

. / cache.sh - cache home page
. / cache.sh XXX / ා cache XXX channel page (links may end with / in the website)
. / cache.sh XXX / xxx ා cache XXX article page (links in the website will not end with / end)
. / cache. Sh @ ා cache all frequently visited pages (script else section)

You can add the frequently visited pages to the else part of the script, set the script as a scheduled task, refresh the cache every other period of time, and effectively reduce the number of queries to the database

However, this is not enough. The editor may need to update the cache immediately after publishing the article, so as to check whether the news, recommendations, etc. in the home page, channel page, etc. are correct. The following PHP script can be used to execute (if the website system is written in PHP):

<?php

$URIS = array(
    '',
    'xxx/',
    'xxx/xxx',
);

if (isset($_GET['n'])) {
    if (!in_array($_GET['n'], $URIS)) {
        exit('Wrong request');
    }
    $uri = $_GET['n'];
} else {
    $uri = '@';
}

echo  "<pre>\r\n";
system(__DIR__.'/cache.sh \''.$uri.'\' 2>&1');
echo "</pre>\r\n";

Just request / cache / cache.php when you need to refresh the cache. You can use the n parameter to specify which page to refresh only. In order to prevent malicious use of this script to execute dangerous commands (such as: ABC ‘; RM – RF’ xxx), the above judgment determines that only the specified page can be refreshed. Of course, the file name change or password and other measures may be more reliable