Python crawler selenium library detailed tutorial

Time:2021-12-1

The text and pictures of this article come from the network, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.

The following article is from a python programmer

Novices and Xiaobai who have just come into contact with Python can copy the following linkWatch the basic introduction to Python video for free

https://v.douyu.com/author/y6AZ4jn9jwKW

 

In the process of crawling web pages, we often find that the data we want to obtain cannot be simply obtained by parsing HTML code. These data are presented on the page through Ajax asynchronous loading or JS rendering.

Selenium is an automated testing tool that supports multiple browsers. In the crawler, we can use it to simulate the browser browsing the page, and then solve the problem of JavaScript rendering.

1. Use examples

Python crawler selenium library detailed tutorial

 

2. Detailed introduction

2.1 declaring browser objects

That is, tell the program which browser should be used for operation

Python crawler selenium library detailed tutorial

 

2.2 access page

Python crawler selenium library detailed tutorial

 

2.3 finding elements

After successfully accessing the web page, we may need to do some operations, such as finding the search box, entering keywords, and then hitting the Enter key.

Therefore, you need to find the element in selenium.

2.3.1 single element

Selenium finds elements in two ways.

The first is to specify which method to use to find elements, such as CSS selection or XPath

Python crawler selenium library detailed tutorial

 

The following is a detailed element lookup method

find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

 

The second is to use find directly_ Element(), the first parameter passed in is the element lookup method to be used

Python crawler selenium library detailed tutorial

 

2.3.2 multiple elements

Finding multiple elements is basically the same as finding a single element (you only need to add an s to the func for finding a single element).

Finding multiple elements returns a list.

Python crawler selenium library detailed tutorial

 

2.4 element interaction

Element interaction is to get an element first, and then call the interaction method on the obtained element.

For example, enter text in the search box:

Python crawler selenium library detailed tutorial

 

2.5 interaction

Interactive action is to attach actions to the interactive chain for serial execution, and actionchains need to be used.

2.6 executing JavaScript

For example, drag and drop

Python crawler selenium library detailed tutorial

 

2.7 obtaining element information

After the element has been found through element search, you may also need to obtain the attribute and text of the element

2.7.1 get attributes

Python crawler selenium library detailed tutorial

 

2.8 Frame

If you locate the parent frame, you cannot find the information of the child frame, so you need to switch to the child frame and find it again. Similarly, the information of the parent frame cannot be found in the child frame

Python crawler selenium library detailed tutorial

 

2.9 waiting

When requesting a web page, AJAX may load asynchronously. Selenium only loads the main web page and does not take Ajax into account. Therefore, it is necessary to wait for some time before the operation is carried out after the web page is fully loaded.

2.9.1 implicit waiting

When implicit waiting is used, if the webdriver does not find the specified element, it will continue to wait. After the specified time is exceeded, if the specified element is still not found, an exception that the element cannot be found is thrown. The default wait time is 0.

Implicit waiting is waiting for the entire page.

It should be noted that the implicit wait is effective for the entire driver cycle, so you only need to set it once.

Python crawler selenium library detailed tutorial

 

2.9.2 explicit wait

Display wait includes wait condition and wait time.

First, determine whether the waiting condition is true. If it is true, return directly; If the condition is not true, the maximum waiting time is the waiting time. If the waiting condition is not met after exceeding the waiting time, an exception is thrown.

Explicit waiting is waiting for a specified element.

Python crawler selenium library detailed tutorial

 

2.10 browser forward / backward

The back implementation returns to the previous page, and the forward implementation goes to the next page

Python crawler selenium library detailed tutorial

 

2.11 operating cookies

Python crawler selenium library detailed tutorial

 

2.12 tab management

Tab management is the label of the browser. Sometimes we need to add a new tab or delete a tab in the browser, which can be implemented by selenium.

Python crawler selenium library detailed tutorial