Use Python to pick out those “amazing” grannies in station B!

Time:2020-9-18

preface

Recently, the new year’s Eve Party of station B has swept through the major video websites due to its unique creativity, which has brought great positive impact to the company. The share price also soared. Presumably, everyone is regretting not buying the stock of station B earlier

 

 

 

However, what we are going to discuss today is not the new year’s Eve Party of station B, but the core resource of station B: “amazing people”. The inspiration of this article comes from a question on the hot list of Zhihu

 

 

 

 

 

Data acquisition

 

 

 

A total of 859 answers have been obtained to the above questions, and the data in this paper also come from this. Because many of the answers will reflect the link with the grandma’s ID in the answers, as shown in the following figure:

 

 

We can crawl the space ID of the granny master in the question, but considering that not all the answers will have such ID, we extract some bold fonts and get some names of the granny as a supplement to the data

 

 

The above answer is a typical case, which refers to the very popular pupils who received cook’s birthday greetings. Some codes for extracting data are as follows:

 

#Start crawling data
driver = webdriver.Chrome()
driver.maximize_window()
url = 'https://www.zhihu.com/question/291506148'
js='window.open("'+url+'")'
driver.execute_script(js)
for i in range(1000):
     time.sleep(1)
     js="var q=document.documentElement.scrollTop=10000000"  
     driver.execute_script(js)
     print(i)

#Organize data
all_html = [k.get_property('innerHTML') for k in driver.find_elements_by_class_name('AnswerItem')]
all_text = ''.join(all_html)
pat = '/space.bilibili.com/\d+'
spaces = list(set([k for k in re.findall(pat,all_text)]))

 

Now that we have obtained the ID of these “amazing” grandmother owners, the next step is to crawl their personal space in station B to get more detailed information:

 

 

The above is the personal space of famous scientist Geng in station B. from this, we can get the number of fans, the main types of videos (I always thought it should be science and technology, but I didn’t expect it was life. Station B’s discipline is OK), as well as the average number of plays, barrages and comments of all videos. As the basis for subsequent ranking, some codes are as follows:

 

upstat = pd.DataFrame(columns=['name','fans','face','main_type','total_video',
                               'total_play', 'total_comment'])
for i in range(len(spaces)):
    try:
        time.sleep(1)
        space_id = str(spaces[i].replace('/space.bilibili.com/',''))
        url= 'https://api.bilibili.com/x/web-interface/card?mid={}&jsonp=jsonp&article=true'.format(space_id)
        html = requests.get(url=url, cookies=cookie, headers=header).content
        data = json.loads(html.decode('utf-8'))['data']
        this_name = data['card']['name']
        this_fans = data['card']['fans']
        this_face = data['card']['face']
        this_video = int(data['archive_count'])
        total_page = int((this_video-1)/30)+1
        video_list=[]
        for j in range(total_page):
            url = 'https://api.bilibili.com/x/space/arc/search?mid={}&ps=30&tid=0&pn={}&keyword=&order=click&jsonp=jsonp'.format(space_id,str(j+1))
            html = requests.get(url=url, cookies=cookie, headers=header).content
            data = json.loads(html.decode('utf-8'))
            if j == 0 :
                 type_list = data['data']['list']['tlist']
            this_list = data['data']['list']['vlist']
            video_list = video_list + [ this_list [k] for k in range(len(this_list))]
        type_list = list(type_list.values())
        type_list = {type_list[k]['name']:int(type_list[k]['count']) for k in range(len(type_list))}
        this_type = max(type_list,key=type_list.get)
        this_play = sum([video_list[k]['play'] for k in range(len(video_list)) if video_list[k]['play'] != '--'])
        this_comment = sum([video_list[k]['comment'] for k in range(len(video_list)) if video_list[k]['comment'] != '--'])
        upstat = upstat.append({'name':this_name,
                               'fans':this_fans,
                               'face':this_face,
                               'main_type':this_type,
                               'total_video':this_video,
                               'total_play':this_play,
                               'total_comment':this_comment},
                              ignore_index=True)
        print('success:'+str(i))
    except:
        print('fail:'+str(j))
        continue

 

Finally, we got the information of more than 200 “amazing” grandmother owners in station B. the overview data are as follows:

 

 

 

 

Overview

 

 

After obtaining these data, let’s first look at the distribution of the main types of videos released by these “amazing” grannies

 

 

As the classification of B station life is all inclusive, manual Geng and Li Ziqi are classified into life category. It is fantasy to think about it. Therefore, this type of video is grouped more. In addition, the proportion of technology and digital class is also very large, which confirms that station B is an excellent learning website. If you are interested, you can refer to another article: do you believe that you can learn programming by visiting station B?

 

In addition, videos can be collectively referred to as entertainment, including games, movies and TV. After that, the video types will be classified according to science and technology, life and entertainment, so as to find the most “amazing” grandmother in each category.

 

Before starting the official ranking, first use Python to splice the heads of these grannies, and get the following pictures to see how many of them are very familiar to you:

 

 

The code is as follows:

 

i = 0 
for i in range(upstat.shape[0]):
    LOC:'d: / Crawler / amazing / '+ upstat ['name'] + '. JPG'
 # request.urlretrieve(upstat['face'][i],loc)
    img = mpimg.imread(loc)[:,:,0:3]
    img = cv2.resize(img, (500,500),interpolation=cv2.INTER_CUBIC)
    if i % 20 == 0:
        row_img=img
    elif i == 19:
        row_img=np.hstack((row_img,img))
        all_img = row_img
    elif i % 20 == 19:
        row_img=np.hstack((row_img,img))
        all_img = np.vstack((all_img,row_img))
    else:
        row_img=np.hstack((row_img,img))
    i = i+1    
plt.axis('off')
plt.margins(0,0)
plt.imshow(all_img)
plt.savefig ('head. PNG', DPI = 1000)

 

 

 

 

Comprehensive ranking

 

 

 

The next thing to do is bolder. We should take the courage to rank these grannies. Considering the number of fans, the average number of screen shots, the number of videos played, and the number of comments, we can get a comprehensive index. We hereby declare that this ranking is for entertainment only. If we want to further study, awsl will give you a comprehensive index

 

First of all, let’s take a look at the top 10 grannies

 

 

Xiaobian has just been listed in Amway’s wizard finance list recently. I suggest you go and have a look at it. I really put the complicated financial knowledge to the ground. Huanong brothers and Jing Hanqing are also on the list. Let’s take a look at the top 11-20 list:

 

 

Xu Da Sao, Li Ziqi and handmade Geng appear in the list at the same time. There is a chance in the future. I hope someone can plan a cooperation between them. The process is well planned. Manual Geng provides Li Ziqi with post-modern tools. Li Ziqi uses the artifact of Geng to make the hottest pepper in the world. After that, Xu Da Sao eats it in one bite, and the hand-made Geng finally collapses into Xu with his own brain Large Sao alleviates discomfort caused by hot pepper

 

 

 

Ranking by category

 

 

 

After the comprehensive ranking, all the grannies are ranked according to technology, life and entertainment, and they live in the top 10 of each category respectively

 

 

 

 

With the classified ranking, you can ask for it according to your preference. I believe that after watching, the grammar of brain hole will become larger. After a period of time, you can try to publish your own video on site B, and become a famous (strange) grandmother with double-digit fans in site B

 

Finally, the most popular video played by Geng in station B is used as the end of this article. This video reflects the theme of “amazing people” in this article. I hope you can try it in person. If you can write down the feeling of using your limbs after using it, you are welcome to share with us

Recommended Today

Think tools

Official website document Thinkphpp6 documentationhttps://www.kancloud.cn/manua… Swote documenthttps://wiki.swoole.com/#/ Think tool documenthttps://www.kancloud.cn/manua… install composer require topthink/think-swoole command line php think swoole [start|stop|reload|restart] Service startup When you’re on the command linephp think swooleAfter the next execution, an HTTP server will be started, which can directly access the current application ‘server’ => [ ‘host’ => env(‘SWOOLE_ Host ‘,’0.0.0.0’), // […]