Out-of-the-box SEO Solutions for SPA Projects

Time:2019-7-21

Preface

This article is written from July to May, 2019. Please pay attention to the timeliness.

There are few SEO-friendly solutions for SPA projects. The common solutions are as follows:

  1. Change SPA project to SSR rendering
  2. Use pre-rendering

The former is very stable, but there are many problems to be noticed in the transformation of existing SPA projects, and the time-consuming is not much different from rewriting one. The latter can only be very passive for those pages that “no matter what kind of user visits and returns the same results”.

Generally speaking, it is very tedious, but there are still solutions to avoid modifying the original code, such as the following:

https://www.cnblogs.com/lipte…

The basic principle of these schemes is to use proxy server to distinguish the crawler of search engine from the ordinary user so as to achieve targeted content response. The ordinary user responds to the original SPA project that is “pure index. HTML page”, and renders the HTML page under the corresponding route for the crawler response.
Out-of-the-box SEO Solutions for SPA Projects
Now that all the wise netizens have prescribed them, it seems that we need to cook them manually. But take a slow look at GitHub and see if there are any ready-made medicines.
Out-of-the-box SEO Solutions for SPA Projects
Yes, GitHub already has ready-made solutions.

Introduction to Rendora

RendoraIt is a proxy server that uses GO language to write the special design and solution of the SPA project’s SEO processing, supporting configuration files and external interfaces.

Rendora has the following advantages over other schemes:

  1. There is no need to modify the original project
  2. No need to modify the build configuration
  3. Support rendering of arbitrary routing pages
  4. Not limited to front-end frameworks and technologies used
  5. Search engine crawlers and ordinary users get the same data

Its basic principle is the process of request.RendoraWhen it does, it depends on the request headeruser-agentTo determine whether the request belongs to a crawler or an ordinary user, the ordinary user directly proxies to the original Web server, and the crawler’s request will be processed by a head-less browser to generate a page to be returned to the crawler, and the content of this page can be understood as a running DOM snapshot.

Out-of-the-box SEO Solutions for SPA Projects

After understanding the basic principles, it is not difficult to imagine that as long as the asynchronous loading of data and then using data to render content pages are applicable. Moreover, the final data obtained by crawlers and ordinary users can be highly consistent.

install

Rendora’s official documentation already shows how to install it, and I copy it directly here, but Rendora itself is written in GO language, and there are still many pits to tread on relying on headless browsers.

The system I use in this article is Ubuntu 18.04 desktop version, but other system users windows and MacOS can be installed and used, Rendora installation mode is slightly different, but the basic concepts are the same.

Basic dependence

  1. Need to installGolangVersion 1.11 or higher
  2. Need to installchromiumBrowser orgoogle-chromeBrowsers, make sure they are accessible in environment variables

Install Rendora

Project address

Installation mode:

git clone https://github.com/rendora/rendora
cd rendora
make build
sudo make install

You can also use docker:

docker run --net=host -v ./CONFIG_FILE.yaml:/etc/rendora/config.yaml rendora/rendora

Be careful: make buildDuring the process, the network will be accessed, and some of the addresses can not be accessed in China, which will lead to the construction failure. If the domestic users do not open the agent, they can try to execute the following two commands in the construction to proxy:

# Start the go modules feature
export GO111MODULE=on
# Set GOPROXY as an environment variable
export GOPROXY=https://goproxy.io

The same other platforms can refer to the official guidance of goproxy.io.

Write configuration files

Rendora runs based on configuration files, so we need to familiarize ourselves with them before running.

Configuration Manual

Configuration files support a variety of formats, here I use the most common JSON format on the Web side, need to note that Rendora will not check spelling errors, please replicate more.

By default, we only need to specify two parameters:

{
    "backend": {
        "url": "http://127.0.0.1:8000"
    },
    "target": {
        "url": "http://127.0.0.1"
    }
}
parameter Meaning
backend The address where the service was originally provided to the user
target Address requested by headless browser

Note: Because Rendora is essentially a proxy server that also starts port listening (default port 3001), the details of these two parameters depend on the combination of back-end technologies. For example, a common combination of technologies may look like the following:

nginx->Rendora->App Server

But it may also be the opposite:

Rendora->nginx->App Server

For example: I listened on a local server for static files hosted on port 80, and in fact the configuration of these two parameters is the same, because the original address is the same as the address requested by the browser.

In addition, there are two options to avoid collision with local ports:

{
  "listen":{
    "port":3001
  },
  "headless":{
    "internal":{
      "url":"http://localhost:9222"
    },
  },
}
  • listen.portRefers to the port number that Rendora listens on
  • headless.internal.urlRefers to the address of a headless browser request

Rendora can also configure two filters:

  1. Request Filter – Determines which requests are rendered by headless browsers and which requests are forwarded
  2. Path Filter – Only requests that meet the request filter will pass through the routing filter, and only requests that meet the path filter rule will be allowed to pass.

A common example of a request filter is as follows:

Be carefulDo not copy comments

---
  "Filters": {// Request filters rendered through filtering using headless browsers
    "userAgent":{
      "DefaultPolicy", "blacklist", // matching policy blacklist mode (default all requests cannot pass through filters)
      "Exceptions": {// Only requests that comply with the following rules are filtered
          // As long as the user-agent contains one of the following characters, it meets the matching criteria
        "keywords":["bot", "bing", "yandex", "slurp", "duckduckgo","baiduspider","googlebot","360spider","Sosospider","sogou spider"]
      },
      // User-agent matches the following content perfectly and can also be used
      "exact":["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"]
    }
  }
---

Add an address filter:

---
"Paths": {// Path filters that conform to the rules are rendered by headless browsers
      "DefaultPolicy": "whitelist", //whitelist mode, all requests pass through filters by default
      "Exceptions": {// Except for the following rules, they will be ignored
        "Prefix": ["/home"], //prefix matching
        "Exact": ["/hello/world"]//complete match
      }
    }
---

key parameterheadless.waitAfterDOMLoad:

The triggering of DOMLoad event in a SPA project does not mean that the page rendering is complete, because the content of the network request has not yet been rendered into the actual DOM.

Rendora, by default, outputs a DOM snapshot after DOMLoad, so we need to manually specify a delay time after DOMLoad is completed, and then we will get the snapshot.

The initial loading of different projects is different. How many milliseconds of delay can be measured using chrome’s network panel. Here I use 2 seconds of delay, which is 2000 milliseconds. This configuration can be found in the complete example below.

Rendora also ignores loading almost all resource files (see the documentation for configuration). In fact, Rendora can load faster than user browsers.

Complete examples:

{
  "listen":{
    "port":3001
  },
  "target":{
    "url":"http://localhost:8080"
  },
  "backend":{
    "url":"http://localhost:8080"
  },
  "headless":{
    "internal":{
      "url":"http://localhost:9222"
    },
    "waitAfterDOMLoad":2000
  },
  "filters":{
    "userAgent":{
      "defaultPolicy":"blacklist",
      "exceptions":{
        "keywords":["bot", "bing", "yandex", "slurp", "duckduckgo","baiduspider","googlebot","360spider","Sosospider","sogou spider"]
      },
      "exact":["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"]
    },
    "paths":{
      "defaultPolicy": "whitelist",
      "exceptions":{ 
        "prefix":["/about"], 
        "exact":["/active/123"]
      }
    }
  }
}

Function

Start Headless Browser

We use the configuration we just configured to run. First, we start the headless browser:

# chromium version
chromium-browser --headless --disable-gpu --remote-debugging-port=9222
# google-chrome version
google-chrome --headless --disable-gpu --remote-debugging-port=9222

Start the project

Then start our project. Here I usevue-cli3A default project was created and started in development mode, which monitors port 8080:

npm run serve

picturePerformance in browsers
Out-of-the-box SEO Solutions for SPA Projects

Start Rendora

rendora --config ./config.json

test

Use postman to make headless requests once and then view them/Note that what we are requesting here is port 3001 opened by Rendora, not port 8080 of the project.

pictureNo header request result was used:
Out-of-the-box SEO Solutions for SPA Projects
You can see from the picture that the page has not been rendered yet.

Add this requestuser-agentHeader and view the output.

pictureThe output after adding header:
Out-of-the-box SEO Solutions for SPA Projects
At this point, it is obvious that the response is delayed, and then the content of the page is output, but this delay is not always Rendora will cache the content under that address and then access it without any rendering, and you can specify the cache time, and you can also move the cache to redis for management. Reason.

We can also try to configure access banned/about:
Out-of-the-box SEO Solutions for SPA Projects
At this point you can see the intercepted/aboutThe address returnsindex.htmlThe content is not rendered well.aboutPage.

Relevant Contents & References

http://www.runtester.com/deta…
https://github.com/rendora/re…
https://github.com/rendora/re…

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’39.107.86.223′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]