Who still runs VIP movies in this era? I teach you to use Python to crawl your favorite movies and download them to the local

Time:2021-1-12

The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time
在这里插入图片描述
Now that you’ve got novels, pictures and videos, download movies or TV dramas https://www.okzyw.com/

First enter the search page: https://www.okzyw.com/index.php?m=vod-search
Search your favorite dramas (for example, I like western world)_ ^)

在这里插入图片描述
Enter the network to view the post request:

在这里插入图片描述
I also cut off the data, and just add the code

import requests
import parsel,os
from ffmpy3 import FFmpeg
from concurrent.futures import ThreadPoolExecutor as pool


headers ={
    'User-Agent': 'Mozilla/5.0 (Linux; U; Android 2.2.1; zh-cn; HTC_Wildfire_A3333 Build/FRG83D) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'
}

WD = "western world Season 1"
params = {
    'wd': wd,
    'submit': 'search',
}
target = 'http://www.okzyw.com'
url = 'https://www.okzyw.com/index.php?m=vod-search'

html = requests.get(url,params=params,headers=headers).text

res = parsel.Selector(html)
for each in res.xpath('//div[@class="xing_vb"]/ul[2]/li'):
    link = target + each.xpath('./span[2]/a/@href').get()
    print(link)

 

Content page:

output:
http://www.okzyw.com/?m=vod-detail-id-12790.html

 

After entering the content page, we need to use the movie title as a folder to avoid confusion

html = requests.get(link).text
res = parsel.Selector(html)
#Get movie title as folder
title = res.xpath('//div[@class="vodh"]/h2/text()').get()

#Create folder
if title not in os.listdir("./"):
     os.mkdir(title)

 

There are two formats. We choose m3u8 format
在这里插入图片描述

Because it’s a static web page, I’ll go directly to the code:

for each in res.xpath('//div[@id="2"]/ul/li'):
    #Get the number of sets
    num = each.xpath('./text()').get().split("$")[0]

    #Get the corresponding links for each episode
    m3u8 = each.xpath('./input/@value').get()
    dic_url[m3u8] = num
    print(m3u8)

 

Successfully obtained:
在这里插入图片描述
The next step is to download the m3u8 format. The m3u8 format is composed of multiple TS formats, which is the way most websites will choose now. That is to say, if you know this, it will also be useful to climb others
在这里插入图片描述
Since it is composed of multiple ts, how to merge? Here we need to use a library: ffmpy3
Just PIP it directly

pip install ffmpy3

 

After that, two lines of code can merge ts

from ffmpy3 import FFmpeg
FFmpeg(inputs={URL:None}, outputs={name:None).run()

 

However, before that, you need to download a file, which will be downloaded after decompression FFmpeg.exe Put in the PY file directory, I have put in the network disk, need friends can download, this is 64 bit, download link: at the bottom

Because each episode has several hundred meters, there are eight threads:

import requests
import parsel,os
from ffmpy3 import FFmpeg
from concurrent.futures import ThreadPoolExecutor as pool


headers ={
    'User-Agent': 'Mozilla/5.0 (Linux; U; Android 2.2.1; zh-cn; HTC_Wildfire_A3333 Build/FRG83D) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'
}
WD = input ("please enter the name of the movie you want to download (for example: Western world Season 1):")
#WD = "western world Season 1"
params = {
    'wd': wd,
    'submit': 'search',
}
target = 'http://www.okzyw.com'
url = 'https://www.okzyw.com/index.php?m=vod-search'

html = requests.get(url,params=params,headers=headers).text

res = parsel.Selector(html)
for each in res.xpath('//div[@class="xing_vb"]/ul[2]/li'):
    link = target + each.xpath('./span[2]/a/@href').get()


    dic_url = {}
    html = requests.get(link).text
    res = parsel.Selector(html)
    #Get movie title as folder
    title = res.xpath('//div[@class="vodh"]/h2/text()').get()

    #Create folder
    if title not in os.listdir("./"):
        os.mkdir(title)

    for each in res.xpath('//div[@id="2"]/ul/li'):
        #Get the number of sets
        num = each.xpath('./text()').get().split("$")[0]

        #Get the corresponding links for each episode
        m3u8 = each.xpath('./input/@value').get()
        dic_url[m3u8] = num

def down_movie(k,v):

    Print ("downloading:, end =)
    print(k,v)
    name = os.path.join(title, v +".mp4")
    FFmpeg(inputs={k:None}, outputs={name:'-loglevel quiet'}).run()       
if __name__ == "__main__":
    #Turn on thread pool
    pl = pool(max_workers=8)
    pl.map(down_movie,dic_url.keys(),dic_url.values())
    pl.shutdown()

 

Because there is no optimization, you should be as detailed as possible when entering the name:

在这里插入图片描述

Done:

在这里插入图片描述
As you can see, it’s really downloading at the same time. You can go out to have an activity at this time and download it when you come back.