Python: Double Eleven is hot, let's climb a certain treasure and participate in a wave of hand-picking~

Time:2022-9-21

Brothers, have you bought your hands on Double Eleven? Shopping is happy, but don't overdo it. After all, many price cuts are not real price cuts, and your eyes are brightened~
在这里插入图片描述
Today, let's try to climb a certain treasure. If this thing learns to be outsourced, it is still worth a few dollars.
在这里插入图片描述
Environment/Module Introduction

The environment used by python 3.8
Editor used by pycharm   
selenium browser-driven third-party modules
Module for saving csv data
time time module, can be used for program delay
random random number module
#Brothers learn python, sometimes I don't know how to learn and where to start. After mastering some basic grammar or doing two cases, I don't know what to do next, and I don't know how to learn more advanced knowledge.
#Then for these big brothers, I have prepared a lot of free video tutorials, PDF e-books, and the source code of the video source!
#There will be a big guy to answer!
#All in this group 872937351
#Welcome to join, discuss and learn together!

Download third-party modules that operate browser drivers

selenium  pip install selenium

I basically wrote the explanation in the comments, so I was lazy and didn't write it.

from selenium import webdriver
import time # Time module, can be used for program delay
import random # random number module
from constants import TAO_USERNAME1, TAO_PASSWORD1
import csv # module for data saving
def search_product(keyword):
    """Search product data, login user"""
    driver.find_element_by_xpath('//*[@id="q"]').send_keys(keyword)
    time.sleep(random.randint(1, 3)) # Try to avoid random delays in human-machine detection

    driver.f
def parse_data():
    """Analyze product data"""
    divs = driver.find_elements_by_xpath('//div[@class="grid g-clearfx"]/div/div') # all div tags

    for div in divs:
        try:
            info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a').text
            price = div.find_element_by_xpath('.//strong').text + '元'
            deal = div.find_element_by_xpath('.//div[@class="deal-cnt"]').text
            name = div.find_element_by_xpath('.//div[@class="shop"]/a/span[2]').text
            location = div.find_element_by_xpath('.//div[@class="location"]').te'.//div[@class="pic"]/a').get_attribute('href')

            print(info, price, deal, name, location, detail_url)

            # keep
            with open('某宝.csv', mode='a', encoding='utf-8', newline='') as f:
                csv_write = csv.writer(f)
                csv_write.writerow([info, price, deal, name, location, detail_url])
        except:
            continue
word = input('Please enter the keyword you want to search for:')
# create a browser
driver = webdriver.Chrome()

# The browser operated by selenium is recognized and cannot log in
# Modify some properties of the browser to bypass the detection
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
            {"source": """Object.defineProperty(navigator, 'webdriver', {get: () => false})"""})


# Perform automated browser actions
driver.get('https://www.taobao.com/')
driver.implicitly_wait(10) # Set the wait of the browser and load the data
driver.maximize_window() # maximize the browser


# Call the function for product search
search_product(word)

for page in range(100): # 012
    print(f'\n================== Fetching page {page + 1} data =============== =====')
    url = f'https://s.taobao.com/search?q=%E5%B7%B4%E9%BB%8E%E4%B8%96%E5%AE%B6&s={page * 44}'
    # Parse product data
    parse_data()
    time.sleep(random.randint(1, 3)) # Try to avoid random delays in human-machine detection