The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.
The following article is from it sharing house by it sharers
Just imagine a question, if we want to capture the comment data of a microblog big V microblog, how should we achieve it? The simplest way is to find the microblog comment data interface, and then change the parameters to get the latest data and save it. First, find the interface to capture comments from the microblog API, as shown in the figure below.
But unfortunately, the frequency of the interface is limited. It’s forbidden after a few times. It’s cool before taking off.
Next, Xiaobian chooses the mobile website of microblog, logs in first, finds the microblog that we want to grab comments, opens the browser’s own traffic analysis tool, continuously drops down comments, and finds the comment data interface, as shown in the figure below.
After that, click the “parameters” tab to see that the parameters are as shown in the following figure:
You can see that there are 4 parameters in total, of which first, second parameters are the micro-blog ID, which is equivalent to the ID number of the micro-blog, which is equivalent to the micro-blog’s “ID number”, max_ ID is the parameter to change the page number. It changes every time, and the next max_ The ID parameter value is in the return data of this request.
With the above foundation, let’s start with the code and implement it in Python.
1. First distinguish the URL, the first time does not need max_ ID, the second time needs to use the max returned from the first time_ id。
2. You need to bring the cookie data when you request. The validity period of microblog cookie is long enough to catch a microblog comment data. The cookie data can be found in the browser analysis tool.
3. Then the returned data is converted into JSON format, and the comment content, reviewer nickname, comment time and other data are taken out. The output result is shown in the figure below.
4. In order to save the comment content, we need to remove the expression in the comment and use regular expressions for processing, as shown in the figure below.
5. After that, save the content to the txt file, and use the simple open function to implement it, as shown in the figure below.
6, the key point is that through this interface, we can only return 16 pages of data (20 pages per page), and there are 50 pages on the Internet, but the interface is different and the number of data returned is different. So I added a for loop, and it is awesome to go all the way, as shown in the following diagram.
7. Here we name the function job. In order to get the latest data all the time, we can use schedule to add a timing function to the program, and grab it every 10 minutes or half an hour, as shown in the figure below.
8. The obtained data is de reprocessed, as shown in the figure below. If the comment is already in it, pass it directly. If not, continue to add.
This work has been basically completed.