Powerful Python package (10): Selenium (browser robot)

Time:2022-1-5

1. Introduction to selenium

Selenium is a tool to operate the browser by program. It can realize browser automation, automatic testing, auxiliary crawler and so on.

When we use the browser, all operations are based on mouse and keyboard interaction. Selenium uses the form of program to replace our keyboard and mouse operation to realize automatic operation.

When writing crawlers with scratch, we can use selenium to drive the browser to load pages and getJavaScraptThe HTML code of the rendered page does not need to consider a series of complex problems such as the loading form of the web page and whether the interface is encrypted.

2. Selenium overview

Powerful Python package (10): Selenium (browser robot)

selenium
Browser driven

By specifying the browser driver of the operation, we can operate the browser with code through selenium.

Powerful Python package (10): Selenium (browser robot)
drive code implementation
Chrome browser driver = webdriver.Chrome( )
Internet Explorer driver = webdriver.Ie( )
Edge browser driver = webdriver.Edge( )
Opera browser driver = webdriver.Opera( )
Phantom JS browser driver = webdriver.PhantomJS( )
Element positioning

Using element positioning, we can find any object in the loaded page, which is similar to viewing the loaded page and finding our target information, so as to perform the next operation.

Powerful Python package (10): Selenium (browser robot)

Element positioning
Element positioning code implementation
ID location find_element_by_id( )、find_element(By.ID,’id’)
Name positioning find_element_by_name( )、find_element(By.NAME,’name’)
Class positioning find_element_by_class_name( )、find_element(By.CLASS_NAME,’class_name’)
Link positioning find_element_by_link_text( )、find_element(By.LINK_TEXT,’link_text’)
Tag positioning find_element_by_tag_name( )、find_element(By.TAG_NAME,’tag_name’)
XPath positioning find_element_by_xpath( )、find_element(By.XPATH,’xpath’)
CSS positioning find_element_by_css( )、find_element(By.CSS,’css’)
Browser operation

Browser operations are operations for browser clients, such as maximization and minimization.

Powerful Python package (10): Selenium (browser robot)
Browser operation code implementation
Maximize browser.maximize_window( )
minimize browser.minimize_window( )
Set window size browser.set_window_size( )
forward browser.forword( )
back off browser.back( )
Refresh browser.refresh( )
Operation test object

Operating test objects are some methods we often use in automated testing, mainly to operate the located elements.

Powerful Python package (10): Selenium (browser robot)
Operation test object code implementation
Click object click( )
Analog key input send_keys( )
Clear object contents clear( )
Submit object content submit( )
Get element text information text( )
Keyboard events

In the operation test object, send_ Keyboard events can be passed in keys (), which is equivalent to pressing a special key.

Powerful Python package (10): Selenium (browser robot)

Keyboard events
Keyboard events code implementation
TAB send_keys(Keys.TAB)
ENTER send_keys(Keys.ENTER)
BackSpace send_keys(Keys.BackSpace)
Space send_keys(Keys.Space)
Esc send_keys(Keys.Esc)
F1 send_keys(Keys.F1)
F12 send_keys(Keys.F12)
Select all send_keys(Keys.CONTROL,’a’)
copy send_keys(Keys.CONTROL,’c’)
shear send_keys(Keys.CONTROL,’x’)
paste send_keys(Keys.CONTROL,’v’)
Mouse event

Mouse events can be used to perform all actions that the mouse can complete.

Powerful Python package (10): Selenium (browser robot)

Mouse event
Mouse event code implementation
Execute actions in actionchains perform( )
Right click content_click( )
double-click double_click( )
drag drag_and_drop( )
Mouse over move_to_element( )
Window and frame switching

When multiple web pages are opened, the display of web pages can be switched by using the window and frame switching method.

Powerful Python package (10): Selenium (browser robot)

Window switching
Get assertion information
Powerful Python package (10): Selenium (browser robot)
Cookie operation
Powerful Python package (10): Selenium (browser robot)

Insert picture description here

3. Selenium is applied to reptiles

Selenium is applied to crawlers mainly to solve the problem that cannot be solved by scratch: obtain the HTML code of the page rendered by JavaScript.

In the previous article on the scene library, we learned that there is a downloader middleware between the engine and the crawler, and the scene downloads the web source code through this downloader middleware; However, in the face of web pages rendered by JavaScript, the downloader middleware is powerless. At this time, selenium plays the role of replacing the downloader middleware.

The main application process of selenium in crawler is shown as follows:

Powerful Python package (10): Selenium (browser robot)
"" "Suning Tesco find iPhone" ""

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver

driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com')

input = driver.find_element_by_id('searchKeywords')

input.clear
input.send_keys('iphone')
input.send_keys(Keys.RETURN)

wait = WebDriverWait(driver,10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME,'root990')))
print(driver.page_source)
"" "auto drop down page" ""

from selenium import webdriver
import time

driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')
time.sleep(4)

input = driver.find_element_by_id('searchKeywords')
input.clear
input.send_keys('iphone')
input.send_keys(Keys.RETURN)
driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
"" "locate element" ""

from selenium import webdriver

driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')

input_id = driver.find_element_by_id('searchKeywords')
input_name = driver.find_element_by_name('index1_none_search_ss2')
input_xpath = driver.find_element_by_xpath("//input[@id='searchKeywords']")
input_css = driver.find_element_by_css_selector('#searchKeywords')
print(input_id,input_name,input_xpath,input_css)
"" "wait for page load to complete" ""

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

driver = webdriver.Edge(executable_path='msedgedriver.exe')

#Set the timeout for page loading
driver.set_page_load_timeout(5)
try:
    driver.get('https://www.suning.com/')
    driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
    print(driver.page_source)
except TimeoutException:
    print('timeout')
driver.quit()
"" "implicit wait" ""

from selenium import webdriver

driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.implicitly_wait(5)
driver.get("https://www.suning.com/")
print(driver.page_source)
"" "show waiting" ""

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')

try:
    input = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,"searchKeywords")))
    print(input)
except TimeoutException:
    print('time out!')
driver.quit()
The scratch framework can only crawl static websites. If you need to crawl dynamic websites, you need to combine selenium library to render JS before you can crawl to dynamic pages.

Write at the end

Welcome to the official account: human slaves!
Study together and make progress together!

Recommended Today

Big data Hadoop — spark SQL + spark streaming

catalogue 1、 Spark SQL overview 2、 Sparksql version 1) Evolution of sparksql 2) Comparison between shark and sparksql 3)SparkSession 3、 RDD, dataframes and dataset 1) Relationship between the three 1)RDD 1. Core concept 2. RDD simple operation 3、RDD API 1)Transformation 2)Action 4. Actual operation 2)DataFrames 1. DSL style syntax operation 1) Dataframe creation 2. SQL […]