1. Introduction to selenium
Selenium is a tool to operate the browser by program. It can realize browser automation, automatic testing, auxiliary crawler and so on.
When we use the browser, all operations are based on mouse and keyboard interaction. Selenium uses the form of program to replace our keyboard and mouse operation to realize automatic operation.
When writing crawlers with scratch, we can use selenium to drive the browser to load pages and getJavaScraptThe HTML code of the rendered page does not need to consider a series of complex problems such as the loading form of the web page and whether the interface is encrypted.
2. Selenium overview

Browser driven
By specifying the browser driver of the operation, we can operate the browser with code through selenium.

drive | code implementation |
---|---|
Chrome browser | driver = webdriver.Chrome( ) |
Internet Explorer | driver = webdriver.Ie( ) |
Edge browser | driver = webdriver.Edge( ) |
Opera browser | driver = webdriver.Opera( ) |
Phantom JS browser | driver = webdriver.PhantomJS( ) |
Element positioning
Using element positioning, we can find any object in the loaded page, which is similar to viewing the loaded page and finding our target information, so as to perform the next operation.

Element positioning | code implementation |
---|---|
ID location | find_element_by_id( )、find_element(By.ID,’id’) |
Name positioning | find_element_by_name( )、find_element(By.NAME,’name’) |
Class positioning | find_element_by_class_name( )、find_element(By.CLASS_NAME,’class_name’) |
Link positioning | find_element_by_link_text( )、find_element(By.LINK_TEXT,’link_text’) |
Tag positioning | find_element_by_tag_name( )、find_element(By.TAG_NAME,’tag_name’) |
XPath positioning | find_element_by_xpath( )、find_element(By.XPATH,’xpath’) |
CSS positioning | find_element_by_css( )、find_element(By.CSS,’css’) |
Browser operation
Browser operations are operations for browser clients, such as maximization and minimization.

Browser operation | code implementation |
---|---|
Maximize | browser.maximize_window( ) |
minimize | browser.minimize_window( ) |
Set window size | browser.set_window_size( ) |
forward | browser.forword( ) |
back off | browser.back( ) |
Refresh | browser.refresh( ) |
Operation test object
Operating test objects are some methods we often use in automated testing, mainly to operate the located elements.

Operation test object | code implementation |
---|---|
Click object | click( ) |
Analog key input | send_keys( ) |
Clear object contents | clear( ) |
Submit object content | submit( ) |
Get element text information | text( ) |
Keyboard events
In the operation test object, send_ Keyboard events can be passed in keys (), which is equivalent to pressing a special key.

Keyboard events | code implementation |
---|---|
TAB | send_keys(Keys.TAB) |
ENTER | send_keys(Keys.ENTER) |
BackSpace | send_keys(Keys.BackSpace) |
Space | send_keys(Keys.Space) |
Esc | send_keys(Keys.Esc) |
F1 | send_keys(Keys.F1) |
F12 | send_keys(Keys.F12) |
Select all | send_keys(Keys.CONTROL,’a’) |
copy | send_keys(Keys.CONTROL,’c’) |
shear | send_keys(Keys.CONTROL,’x’) |
paste | send_keys(Keys.CONTROL,’v’) |
Mouse event
Mouse events can be used to perform all actions that the mouse can complete.

Mouse event | code implementation |
---|---|
Execute actions in actionchains | perform( ) |
Right click | content_click( ) |
double-click | double_click( ) |
drag | drag_and_drop( ) |
Mouse over | move_to_element( ) |
Window and frame switching
When multiple web pages are opened, the display of web pages can be switched by using the window and frame switching method.

Get assertion information

Cookie operation

3. Selenium is applied to reptiles
Selenium is applied to crawlers mainly to solve the problem that cannot be solved by scratch: obtain the HTML code of the page rendered by JavaScript.
In the previous article on the scene library, we learned that there is a downloader middleware between the engine and the crawler, and the scene downloads the web source code through this downloader middleware; However, in the face of web pages rendered by JavaScript, the downloader middleware is powerless. At this time, selenium plays the role of replacing the downloader middleware.
The main application process of selenium in crawler is shown as follows:

"" "Suning Tesco find iPhone" ""
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver
driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com')
input = driver.find_element_by_id('searchKeywords')
input.clear
input.send_keys('iphone')
input.send_keys(Keys.RETURN)
wait = WebDriverWait(driver,10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME,'root990')))
print(driver.page_source)
"" "auto drop down page" ""
from selenium import webdriver
import time
driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')
time.sleep(4)
input = driver.find_element_by_id('searchKeywords')
input.clear
input.send_keys('iphone')
input.send_keys(Keys.RETURN)
driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
"" "locate element" ""
from selenium import webdriver
driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')
input_id = driver.find_element_by_id('searchKeywords')
input_name = driver.find_element_by_name('index1_none_search_ss2')
input_xpath = driver.find_element_by_xpath("//input[@id='searchKeywords']")
input_css = driver.find_element_by_css_selector('#searchKeywords')
print(input_id,input_name,input_xpath,input_css)
"" "wait for page load to complete" ""
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
driver = webdriver.Edge(executable_path='msedgedriver.exe')
#Set the timeout for page loading
driver.set_page_load_timeout(5)
try:
driver.get('https://www.suning.com/')
driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
print(driver.page_source)
except TimeoutException:
print('timeout')
driver.quit()
"" "implicit wait" ""
from selenium import webdriver
driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.implicitly_wait(5)
driver.get("https://www.suning.com/")
print(driver.page_source)
"" "show waiting" ""
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Edge(executable_path='msedgedriver.exe')
driver.get('https://www.suning.com/')
try:
input = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,"searchKeywords")))
print(input)
except TimeoutException:
print('time out!')
driver.quit()
The scratch framework can only crawl static websites. If you need to crawl dynamic websites, you need to combine selenium library to render JS before you can crawl to dynamic pages.
Write at the end
Welcome to the official account: human slaves!
Study together and make progress together!