BiliBili of Python crawler

Time:2021-7-25

Statement: the following contents are my personal understanding. If you find any errors or questions, you can contact me for discussion

Introduction to reptiles

Website introduction

The website crawled this time isbilibili, it is a well-known video barrage website in China. There are timely animation, active ACG atmosphere and creative up owners. You can find a lot of joy here.

Reasons and uses for writing Crawlers

BiliBili has changed from a small broken site to a phenomenon level diversified community website. The purpose of climbing it this time is to take it as a typical example and tell you a way of thinking when you encounter various types of verification codes.

In fact, the simplest way for such websites is to log in in in advance, manually obtain cookies, and then request the websites we need to crawl according to the cookies. Personal purpose crawlers can use this method to save code time. However, the company may encounter the crawling needs of many accounts. It is more troublesome to manually log in one by one to obtain cookies. At this time, it is much more efficient to use selenium automation to obtain cookies.

Selenium

brief introduction

As their official introductionSelenium automates browsers. That's it!, it is an automatic browser that can simulate human operation.

Using tutorials

RecommendedSelenium Chinese websiteLearning, very comprehensive!

Verification code analysis

Sliding verification code

The verification code before BiliBili is a sliding verification code. The main idea is to find the gap, determine the coordinates of the gap, and then slide to the specified position through selenium operation. Similarly, there are most of Alibaba’s web pages, such as flying pig, Taobao, tmall, etc. However, Alibaba’s web pages do not need to be verified every time, and they have to be operated according to the actual situation.

This is to find the rightmost position information, and then slide it

BiliBili of Python crawler

This requires first finding the position of the whole picture, then sliding, first finding the outline of the content, and then sliding. They all evolve with the same idea

BiliBili of Python crawler

Look at the picture and fill in the answer series

Including online trading of Oriental Wealth, bigquant, etc., which are relatively simple. Download it and process it according to the verification code, and then hand it to the orc service identification of major cloud service providers. There is a free trial quota. You can also try several more companies for comparison according to your own needs and preferences.

BaidutencentAliYoudao Zhiyun

Operate according to the picture and click the series

At present, there are many such verification codes. The difficulty of this verification code is that there are many changes, not only limited to Chinese characters and numbers, but also pictures. At this time, you can find a way to solve it, but it is more troublesome once the strategy is changed. You can identify the content with the help of various coding platforms, and then operate according to the content

Easy cloud codingQuick identification websiteFeifei codewait

BiliBili login analysis

The latest verification code of BiliBili belongs to the third type. After clicking the login button, a verification code box will appear. We need to download this picture to the coding platform for identification, obtain coordinate information, and then click selenium

BiliBili of Python crawler

BiliBili verification code

Write code

Selenium simulated Login

import re

This work adoptsCC agreement, reprint must indicate the author and the link to this article