Python Taobao crawler small example

Time:2022-5-4

Brothers, have you cut your hands on double eleven? Although shopping is happy, don’t overdo it. After all, many price cuts are not real price cuts. Your eyes are shining~

Python Taobao crawler small example

Today, let’s try to climb a treasure. If you learn to outsource, it’s still worth a few dollars.

Python Taobao crawler small example

Environment / module introduction

Environment used by Python 3.8
Editor used by pycharm
Selenium browser driven third-party module
CSV data saving module
Time module, which can be used for program delay
Random random number module

Download and operate the browser driven third-party module
Corresponding video tutorial:

Python: double ten is in progress. I’ll teach you to continue chopping your hands with Python

?
1
selenium  pip install selenium

I basically wrote all the explanations in the notes, so I was lazy and stopped writing.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
from selenium import webdriver
import time  #Time module, which can be used for program delay
import random  #Random number module
from constants import TAO_USERNAME1, TAO_PASSWORD1
import csv  #Data saving module
 
 
 
def search_product(keyword):
    "" "search product data, login user" ""
    driver.find_element_by_xpath('//*[@id="q"]').send_keys(keyword)
    time.sleep(random.randint(1, 3))  #Try to avoid man-machine detection} random delay
 
    driver.f
def parse_data():
    "" "parse product data" ""
    divs = driver.find_elements_by_xpath('//div[@class="grid g-clearfx"]/div/div'#All div Tags
 
    for div in divs:
        try:
            info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a').text
            price = div.find_element_by_xpath('.//strong').text + 'Yuan'
            deal = div.find_element_by_xpath('.//div[@class="deal-cnt"]').text
            name = div.find_element_by_xpath('.//div[@class="shop"]/a/span[2]').text
            location = div.find_element_by_xpath('.//div[@class="location"]').te'.//div[@class="pic"]/a').get_attribute('href')
 
            print(info, price, deal, name, location, detail_url)
 
            #Preserve
            with open('a treasure csv', mode='a', encoding='utf-8', newline='') as f:
                csv_write = csv.writer(f)
                csv_write.writerow([info, price, deal, name, location, detail_url])
        except:
            continue
 
 
word = input('please enter the keyword you want to search for goods:')
#Create a browser
driver = webdriver.Chrome()
 
#The browser operated by selenium is recognized and cannot be logged in
#Modify some properties of the browser to bypass detection
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
            {"source": """Object.defineProperty(navigator, 'webdriver', {get: () => false})"""})
 
 
#Perform automated browser operations
driver.get('https://www.taobao.com/')
driver.implicitly_wait(10#Set the waiting time of the browser and load data
driver.maximize_window()  #Maximize browser
 
 
#Call the function of product search
search_product(word)
 
for page in range(100): # 012
    print(f'\ n =================================================)
    url = f'https://s.taobao.com/search?q=%E5%B7%B4%E9%BB%8E%E4%B8%96%E5%AE%B6&s={page * 44}'
    #Analyze commodity data
    parse_data()
    time.sleep(random.randint(1, 3))  #Try to avoid man-machine detection} random delay

Brothers, go and have a try! Python learning videos, answers and e-books can be received by private mail

This is the end of this article about the small example of Python Taobao crawler. For more information about Python Taobao crawler, please search the previous articles of developepper or continue to browse the relevant articles below. I hope you will support developepper in the future!