In order to achieve the crawling of wechat public account articles, we need to do two parts of system processing.
I. automatic browsing of public account articles
One is the implementation of automatic browsing of public account articles on the mobile terminal. When browsing the public account articles one by one, you will ask for the article link address of the public account. You can get the permanent article address link through anyproxy agent resolution tool. After getting the link of the real article address, you can forward it to the server you set up, and save the link addresses of these public articles one by one.
See personal blog: wechat public account article collection: wechat automation for detailed implementation steps and GitHub source resources
II. Content crawling of public account articles on the server
After getting the address link of the public number article through the automatic browsing of the mobile terminal, you can crawl the content of the public number article corresponding to the link address through a simple crawler. After crawling to the content, parse the requested content fields one by one, extract the required field matching, and save it to the database.
See personal blog: wechat public account article collection: server data collection for detailed implementation steps and GitHub source resources