[Python crawler] Newly found a high-quality dancing video website, try it out, boys like it

Time:2022-8-5

It's so uncomfortable. The last time I posted a game, no one saw it. Every day I write for you, my heart is broken~
在这里插入图片描述
Really, let's crawl a wave of short video websites for everyone today, they are all very eye-catching~
在这里插入图片描述
The website address is in the code, and you can see it carefully.

software used

  • python 3.8
  • pycharm 2021.2

module

  • requests
  • parsel
  • re
  • concurrent.futures
  • time
  • warnings

Code display

I know you don't want to see those steps, I'll just go to the code

import requests
import parsel
import re
import concurrent.futures
import time
import warnings

# cancel warning
warnings.filterwarnings("ignore")


def get_html(url):
    """Send a request to get the source code of the web page"""
    html_data = requests.get(url=url, verify=False).text
    return html_data


def parse_data_1(html_data):
    """Parsing for the first time, get all the details page links"""
    selector = parsel.Selector(html_data)
    url_list = selector.xpath('//a[@class="meta-title"]/@href').getall()
    return url_list


def parse_data_2(html_data):
    """Second analysis, get video link"""
    video_url = re.findall('url: "(.*?)",', html_data)[0]
    return video_url


def save(video_url):
    """Save video"""
    title = video_url.split('/')[-1] # Take the field in the link as the title
    video_data = requests.get(video_url, verify=False).content # Send network request
    with open(f'video/{title}', mode='wb') as f:
        f.write(video_data)
    print(title, "Crawling successful!!!")

start_time = time.time()
url = 'https://www.520mmtv.com/hd/rewu.html'
# 1. Send a request to the target website
html_data = get_html(url=url)
# 2. Parse the data extraction details page link for the first time
url_list = parse_data_1(html_data=html_data)
for info_url in url_list[:10]:
    # 3. Send a request to the details page
    html_data_2 = get_html(url=info_url)
    # 4. The second parsing data to extract the video playback address
    video_url = parse_data_2(html_data=html_data_2)
    # 5. Save the video
    save(video_url=video_url)
print('Time spent:', time.time() - start_time)
#Brothers learn python, sometimes I don't know how to learn and where to start.
#After mastering some basic grammar or doing two cases, I don't know what to do next, and I don't know how to learn more advanced knowledge.
#Then for these big brothers, I have prepared a lot of free video tutorials, PDF e-books, and the source code of the video source!
#There will be a big guy to answer!
#All in this group 872937351
#Welcome to join, discuss and learn together!

 

Crawl results

在这里插入图片描述Video tutorial:

video tutorial
Python crawls high-quality girls dancing videos

Brothers, please give me a thumbs up if I've lost my studies~

Recommended Today

Source Code – Axios

Axios content main idea interceptor Cancel Promise: Axios' use of Promises is quite extreme Use the then chain to concatenate your ownrequest interceptor => ask => response interceptorchain, and then execute in an orderly manner Use the asynchronous idea of ​​Promise, separate the resolve method, and provide the ability to actively cancel the request to […]