Use Python to crawl the information of Netease’s strict selection of younger sister’s underwear, and explore their preference

Time:2020-9-28

Netease product reviews crawling

Analyze web pages

Comment analysis

Enter the official website of Netease, search for “bra”, and then click into a commodity at random.

On the product page, open the chrome console, switch to the network page, and then switch the product page to the evaluation tab, select a comment text, such as “thin, comfortable to wear, satisfied”, and search in the network.

As you can see, the comment text is passed through listByItemByTag.json Click to enter the request and copy the URL of the request:

https://you.163.com/xhr/comment/listByItemByTag.json?csrf_token=060f4782bf9fda38128cfaeafb661f8c&__timestamp=1571106038283&itemId=1616018&tag=%E5%85%A8%E9%83%A8&size=20&page=1&orderBy=0&oldItemTag=%E5%85%A8%E9%83%A8&oldItemOrderBy=0&tagChanged=0

Put the URL into postman and try URL query params one by one. Finally, it can be found that only two request parameters, itemid and page, are needed.

The request returns data in JSON format. The following is to analyze the JSON data.

It is not difficult to find that all the comment data is stored in the commentlist, so we only need to save the data.

The following is how to get the information of itemid. This is the ID of the product. Let’s go back to Netease’s strict home page and continue to analyze it.

Product ID acquisition

When we enter keywords into the search box to search, we can also find that there are many requests in the network. At this time, we can observe each request. By the name of the request file (some experience is required here, and the code abiding programmers will not mess up the name), we can locate the request showing the search results during the search.

Search is usually search, so we lock this search.json Request for. Similarly, copy the request URL to postman, verify and transfer the parameters one by one, and finally reserve the page and keyword parameters.

The request returns more data, so we still need to analyze the data patiently. We can also find that the ID value under result > Data > directly > searcherresult > result is the product ID we want to obtain.

Above, we have basically completed the preliminary analysis work, and now we begin to write the code.

Write code

Get product ID

def search_keyword(keyword):
    uri = 'https://you.163.com/xhr/search/search.json'
    query = {
        "keyword": keyword,
        "page": 1
    }
    try:
        res = requests.get(uri, params=query).json()
        result = res['data']['directly']['searcherResult']['result']
        product_id = []
        for r in result:
            product_id.append(r['id'])
        return product_id
    except:
        raise

I got the product ID with page 1 here. The following is to get the comment information under different products through the product ID.

Through the above analysis, we can know that the comment information is in the following form. For this form of information, we can easily store it into mongodb, and then slowly analyze the content of the data.

{
                "skuInfo": [
                    Color: skin color,
                    "Cup size: 75B"
                ],
                "frontUserName": "1****8",
                "frontUserAvatar": "https://yanxuan.nosdn.127.net/f8f20a77db47b8c66c531c14c8b38ee7.jpg",
                "Content": "good quality, comfortable to wear",
                "createTime": 1555546727635,
                "picList": [
                    "https://yanxuan.nosdn.127.net/742f28186d805571e4b3f28faa412941.jpg"
                ],
                "commentReplyVO": null,
                "memberLevel": 4,
                "appendCommentVO": null,
                "star": 5,
                "itemId": 1680205
            }

For mongodb, we can build it ourselves or use free services online. Here I introduce a free mongodb service website: mlab, which is very simple to use, but only introduces the use process.

With the database, the following is to save the data.

def details(product_id):
    url = 'https://you.163.com/xhr/comment/listByItemByTag.json'
    try:
        C_list = []
        for i in range(1, 100):
            query = {
                "itemId": product_id,
                "page": i,
            }
            res = requests.get(url, params=query).json()
            if not res['data']['commentList']:
                break
            print("Crawl to page% s for comments"% I)'data']['commentList']
            C_list.append(commentList)
            time.sleep(1)
            # save to mongoDB
            try:
                mongo_collection.insert_many(commentList)
            except:
                continue
        return C_list
    except:
        raise

Finally, after crawling, there are more than 7000 pieces of data, and we can do some analysis according to personal needs.

Data crawled mongodb link

conn = MongoClient("mongodb://%s:%[email protected]:49974/you163" % ('you163', 'you163')) db = conn.you163 mongo_collection = db.you163

Data analysis of product reviews

Now it’s an exciting time to explore my sister’s preference!

Color preference

Let’s take a look at the colors that girls prefer

As you can see, black is far ahead, oh, here you have to know!

Then through the pie chart to observe the proportion of different colors

Do you have any of these colors that she likes?

size distribution

No problem. 75B is the size of most girls

If you don’t study the cup size, it doesn’t matter. I’ve prepared a comparison table for you. Thank you

Product reviews

Finally, let’s take a look at the evaluation of our products

From the star rating point of view, most of them are five-star high praise. After all, under the name of “strict selection”, the quality must be guaranteed.

Let’s take a look at the comment area. What words do you like best to describe

Comfortable, very comfortable, very comfortable; satisfied, very satisfied, very satisfied.

As if entered the “boast group”, it seems that the first thing girls pay attention to is whether they are comfortable or not, after all, it is close to the body, quality is the most important!

Well, after reading the above analysis, are you more impulsive to take off the single? If you already have a soft girl next to you, is it time to please her around?

Recommended Today

Understand mybatis step by step through the project

Reprint please be sure to indicate the source, original is not easy! Related articles:Understand mybatis < 1 > step by step through the project All code address of the project:Github-Mybatis Mybatis solves the problem of JDBC programming 1. The frequent creation and release of database links results in the waste of system resources, which affects […]