Front end rendering and SEO optimization

Time:2020-12-18

preface

In the era of back-end rendering of website pages, developers only need to make search engine friendly pages according to the specifications, then they can quickly let the search engine collect all pages of their own website.

With the update of front-end and back-end technologies, more and more front-end frameworks come into the view of developers, and the front-end and back-end separation architecture of websites is more and more loved and recognized by developers. The back end only provides data interface, business logic and persistence services, while the view, control and rendering are left to the front end. Therefore, more and more websites have changed from back-end rendering to front-end rendering, and the direct problem is that the major search engine crawlers can’t crawl the front-end rendered pages (dynamic content), which leads to the content of the website can’t be included by the search engine, which directly affects the website traffic and exposure.

Since May last year, the blogger’s website has also adopted the framework of separation before and after, using angularjs frameworkNewRaPo, and then used Vue.js The framework is reconstructed as a wholeRaPo3。 Without exception, they are all based on front-end rendering, and then in more than a year after that, the pages included by search engines are all like this:

Front end rendering and SEO optimization
(it’s the same with other search engines. The earliest screenshots can’t be found. Take this to deal with it first.)

The snapshot looks like this:

Front end rendering and SEO optimization

The actual website of bloggers is as follows:

Front end rendering and SEO optimization

And this:

Front end rendering and SEO optimization

Feel completely abandoned by search engines, OK! Who can find this thing! Who’s going to order it!

In order to enable the search engine to include the blogger’s website normally, the blogger stepped on the road of dynamic SEO optimization

1. Fragment tag

First of all, blogger Google joined the dynamic page<meta name="fragment" content="!">The crawler will be told that this is a dynamic content page, and the crawler will add the current link after it?_escaped_fragment_=tagTo get the static version of the corresponding page, so decisively plan to change in the route, and then rewrite a set of back-end rendering pages to return to all the?_escaped_fragment_=tagLink to.

Just when I was glad that this problem was so easy to solve, I suddenly found that all the information on the Internet indicated that this method was only approved by Google crawler, and other search engines were useless! WTF, let’s have a good time.

2、PhantomJS

Phantomjs is a server-side JavaScript API based on WebKit. It fully supports the web without browser support. It is fast and natively supports various web standards: DOM processing, CSS selectors, JSON, canvas, and SVG. Phantomjs can be used for page automation, network monitoring, web screen capture, and interface free testing

In short, phantomjs can parse HTML and JS on the server side.

How to use it? In short, when judging the crawler to crawl the page, first let phantomjs run every dynamic page, and then return the static results to the crawler. For specific process, please refer to:Using phantomjs to do SEO optimization for Ajax site

Of course, the bloggers didn’t use their own phantomjs service to do dynamic content optimization after reading it, mainly because:

  1. Every time a crawler visits a page, it has to render phantomjs once, which is equivalent to one visit by the crawler. The actual server has to respond twice, the first time to respond to the crawler, and the second time to respond to phantomjs itself. This method not only wastes resources, but also is not elegant;

  2. The compatibility of phantom JS to the new front-end technology will have problems, and the rendering may fail;

  3. There is no cache in the rendered page, and it will be re rendered every time you visit it, which will slow down the response speed of the website.

3、Prerender.io

Prerender.io It is an online service developed based on phantomjs to provide static page rendering for dynamic page SEO. It basically solves the problems encountered in building phantom JS service and website configuration Prerender.io After that, prerender will directly replace the back-end of the website to respond to crawler requests, and will directly return the rendered dynamic pages to the crawler.

Specific configuration:

  1. Registration Prerender.io Account, free users can render 250 pages, which is enough for blog sites;

  2. After installing middleware and setting token, bloggers directly adopt nginx configuration scheme( Prerender.io Other solutions are also available:https://prerender.io/document…Blogger back-end server is uwsgi, according to Prerender.io Provided nginx.conf To make the following changes:

server {
    listen 80;
    server_name www.rapospectre.com;
 
    location @prerender {
        proxy_set_header X-Prerender-Token YOUR_TOKEN;
        include        uwsgi_params;
        
        set $prerender 0;
        if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator") {
            set $prerender 1;
        }
        if ($args ~ "_escaped_fragment_") {
            set $prerender 1;
        }
        if ($http_user_agent ~ "Prerender") {
            set $prerender 0;
        }
        if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
            set $prerender 0;
        }
        
        #resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
        resolver 8.8.8.8;
 
        if ($prerender = 1) {
            
            #setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
            set $prerender "service.prerender.io";
            rewrite .* /$scheme://$host$request_uri? break;
            proxy_pass http://$prerender;
        }
        if ($prerender = 0) {
            uwsgi_pass     127.0.0.1:xxxx;
        }
    }
}

Then restart the server and submit the page through Google search console or other webmaster tools for crawling detection, Prerender.io Crawler request intercepted and rendered successfully:

Front end rendering and SEO optimization

Well, it’s finally solved. Just as bloggers lament that it’s not easy, the results of Google search console’s crawling show that it’s not easy

Front end rendering and SEO optimization

The content is still full${ article.views }}At that time, I thought it should be the website cache problem, so I didn’t think much about it. However, after a week’s test, the situation was still the same. I’ll look back at the prerender rendered webpage

Front end rendering and SEO optimization

It didn’t work at all! Then I checked the configuration and documentation and tried to contact Prerender.io Even the technical support of prerender even proposed the relevant issue to GitHub of prerender, but it did not solve the problem. Finally, the blogger gave up prerender.

4. Build your own backend rendering service

Prerender’s scheme inspired me to make different back-end servers respond by judging the user agent of the visit request, although the online discussion on SEO optimization clearly mentioned the judgment of UA Return to different pages will be punished by the search engine, but I guess it will only be punished if different content is returned. If the same content is returned, the search engine will not be punished. The difference is that one page is directly rendered through the front end, while the other is the page rendered at the back end. The content rendered by the two pages is basically the same, then the search engine will not Discovery.

First of all, change the front-end rendering part of the website code to the back-end rendering, and then push it to a new branch. The blogger’s website modification is very simple, and only about 50 lines of code are modified to complete the requirements:RaPo3-Shadow

Then deploy the back-end rendering code to the server, and then assume that it runs on port 11011 with uwsgi,
At this time, the front-end rendering code is assumed to run on port 11000 by uwsgi;

Finally, modify the nginx configuration file nginx.conf :

server {
    listen 80;
    server_name www.rapospectre.com;
 
    location @prerender {
        include        uwsgi_params;
        
        set $prerender 0;
        if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator") {
            set $prerender 1;
        }
        #resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
        resolver 8.8.8.8;
 
        if ($prerender = 1) {
            uwsgi_pass     127.0.0.1:11011;
        }
        if ($prerender = 0) {
            uwsgi_pass     127.0.0.1:11000;
        }
    }
}

The crawler is judged by UA. If it is, it will be forwarded to port 11011, not to port 11000. Of course, the pages returned by the two ports are basically the same, so there is no need to worry about being punished by the search engine.

After the above configuration, the SEO problem of dynamic pages was finally solved. The fastest response was Google, which crawled and updated to the search engine the next day

Front end rendering and SEO optimization

Then 360 search:

Front end rendering and SEO optimization

Then other search engines that have not submitted the website address have also included the website:

Front end rendering and SEO optimization

Front end rendering and SEO optimization

(Bing doesn’t include normal pages because nginx.conf The reason why Bing crawler UA is not included in the

Of course, I don’t know why baidu didn’t include it after more than two months. It didn’t include the new page in the webmaster tool’s submission page or even the complaint. Yes, at the beginning of that to deal with the picture is the result of Baidu, just cut.

I should say that it’s lucky that it hasn’t been updated, otherwise we can’t find the previous examples, ha ha ha.

Front end rendering and SEO optimization

5. Last guess

The problem of dynamic page SEO optimization has been solved by setting up a back-end rendering service by bloggers themselves. However, bloggers still have a conjecture of a feasible method. I don’t know if it is feasible. If you read the above methods, you can try it.

Canonical tag is a tag launched by Google, Yahoo, Microsoft and other search engines. Its main function is to solve the problem of duplicate content caused by different forms of websites and the same content. This tag has been formulated for a long time, so the mainstream search engines now support this tag. Its original function is to concentrate the weight of URLs with different URLs but the same content to one of the URLs, so as to avoid a large number of duplicate content pages being included by search engines. For example:

http://www.rapospectre.com/archives/1
http://www.rapospectre.com/archives/1?comments=true
http://www.rapospectre.com/archives/1?postcomment=true

The contents of these three websites are exactly the same. In order to avoid duplication and decentralization, add<link rel='canonical' href='http://www.rapospectre.com/archives/1' />In this way, other duplicate pages will be regarded ashttp://www.rapospectre.com/archives/1

So suppose for dynamic web pageshttp://www.rapospectre.com/dynamic/1We wrote his static web page:http://www.rapospectre.com/static/1And then in thehttp://www.rapospectre.com/dynamic/1Add:<link rel='canonical' href='http://www.rapospectre.com/static/1Is it possible to achieve the goal of dynamic SEO optimization?

summary

With more and more front-end rendering pages, the SEO optimization of dynamic pages has gradually entered people’s vision. Bloggers write their own SEO optimization experience of dynamic pages, hoping to help other people who have noticed this field or encounter the same problems, and provide some ideas. thank you.

Original address

Author:rapospectre