Using headless chrome to do single page application SEO

Time:2019-12-3

With the popularity of react, Vue, angular and other front-end frameworks, more and more web applications have become single page applications, which are characterized by asynchronous pulling data to render HTML in the browser. The use of these frameworks greatly improves the web user experience and development efficiency, but at the same time brings a new problem, that is, such web pages can not be included by search engines. Although these web frameworks support server-side rendering, this may increase development costs.

Is there a SEO solution for any single page application that allows us to maintain the original development efficiency without changing the code? Chrome render can help us do this by controlling the headless chrome to render the final HTML back to the crawler.

Introduction to headless Chrome

Not long ago, the chrome team announced that chrome supports headless mode. Headless chrome supports all the functions of chrome, but because it does not display the interface, it is faster and consumes less resources. Compared with the previous phantom JS (the author announced to stop maintenance due to the launch of headless chrome), the advantage of chrome is that it has another strong Google that will always maintain it and optimize it, and chrome is the first in user volume, experience, speed and stability, so I think headless chrome will gradually replace all previous headless browser solutions.

How to control headless Chrome

Since headless chrome runs in an interface free mode, how do you control its interaction with it?
Chrome provides a remote control interface. At present, you can send commands to Chrome for interaction with JS code through chrome remote interface. When you start chrome, you need to open the remote control interface, then connect to chrome through the chrome remote interface, and then control chrome through the protocol. For details, please refer to the document:

  • Start chrome in headless mode and remote control mode

  • Connect to remote chrome to control it

  • What operations are supported when controlling chrome and how to use them

The principle and practice of chrome render

principle

Chrome render starts and guards chrome in headless mode through chrome runner, and then controls chrome through chrome remote interface to visit the web page to be SEO and let chrome run the web page. When the HTML containing data is rendered, read the current web page Dom and convert it into a string, and then return.

How do you know when your web page has rendered HTML with data to be returned? In order to improve the efficiency of chrome render, by defaultdomContentEventFiredAnd then return. For complex scenes, you can also open theuseReadyOption, wait until the page callswindow.chromeRenderReady()And then return.

It’s not enough to render HTML. We also need to detect the access of search engine crawlers. If the crawler is requested, it will return the HTML rendered by chrome render. Otherwise, it will return the HTML required by normal single page application.

To sum up, the overall structure is as follows:
Using headless chrome to do single page application SEO

practice

Just a few simple lines of code are needed for chrome to render HTML:

const ChromeRender = require('chrome-render');
ChromeRender.new().then(async(chromeRender)=>{
    const htmlString = await chromeRender.render({
       url: 'http://qq.com',
    });
});    

Chrome render just does the work of rendering HTML. To realize SEO, it needs to integrate with web server. In order to facilitate your use, I have made a koa SEO middleware, which is very simple to integrate into your existing projects, as follows:

const seoMiddleware = require('koa-seo');
const app = new Koa();
app.use(seoMiddleware());

Just connect a middleware like this and your single page application will be SEO.

Application scenario extension

In addition to the general SEO solution, chrome render can be used for general server-side rendering, because the purpose is to render the final HTML and then return it. I also made a koa middleware koa chrome render for general server rendering. Using chrome render for server-side rendering

The advantages are:

  • Universal for all single page applications

  • There is almost no change to the original code, and the most appropriate place is to add onewindow.chromeRenderReady()To maintain the original development efficiency

The disadvantages are:

  • Compared with react, Vue and other tape only server-side rendering, the performance is low (about 200ms vs 60ms in my test)

  • Chrome render takes up a lot of resources when rendering. A rendering takes up about 25mb of memory. When the request is large, the server may not be able to handle it. However, it can be optimized by caching rendering results.

summary

You may say that this is very similar to prerender.io. Yes, the idea is the same. The advantages of chrome render are as follows:

  • Chrome render is open source and can be deployed by itself. Prerender is a commercial product for a fee

  • Prerender is based on phantomjs that has stopped maintenance

The related projects mentioned in this article are open source and have detailed use documents. Their document links are as follows:

  • chrome-render

  • chrome-runner

  • koa-seo

  • koa-chrome-render

Like to give a star, hope you and I together improve them to make them more powerful.

Read the original text

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’39.107.86.223′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]