The text and pictures of this article come from the network, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.
The following article comes from the it sharing home by it sharer

[I. project background]
I believe everyone has a headache experience. It’s very hard to download movies, isn’t it? You need to download one by one, and you can’t intuitively know the status of recent movie updates.
Today, Xiaobian takes the movie paradise as an example to take you to see your favorite movies more intuitively and download them.

[II. Project preparation]
First, we need to install a pycham software. For pychar software installation, you can see this tutorial: Python environment construction – Amway Python Xiaobai’s detailed tutorial on Python and pychar installation.
Movie paradise website:
https://www.ygdy8.net/html/gndy/dyzz/list_23_1.html
How many libraries do we need to download? First open pychart, click file, and then click setting.

After opening, this interface will appear. Click your project name (Project: (your project name)) project interpreter and click the plus sign to download the library we need (requests, requests, time, re module), as shown in the following figure.

If you can’t load the interpreter, you can refer to this handy tutorial: a simple tutorial on how to configure the Python interpreter after installing pychart.
If the corresponding library is still missing, you can download and install it as follows.

[III. project implementation]
We need (requests, requests, time, re module), as shown in the figure below.

Use the packaging method to realize the functions of each part. First, write a framework: construct a class filmsky, then define an init method to inherit (self), and then define a main method (main). Finally, the main method is implemented. The code is as follows:

This time is used to prevent reverse crawling and set the time delay.
First, let’s analyze the characteristics of the next page of this website.

By clicking on three pages, we will find that the address is changed from “23-3, 4, 5” on the original basis.
We can use {} instead of changing values, like this:
https://www.ygdy8.net/html/gndy/dyzz/list_23_{}.html
In this way, we initialize the URL address and construct the request header in the inti method.

In the main function of the main method, use the for loop to traverse the web address.

The following results are obtained:

That means you’re half done. Come on!!
Now we need to make requests for these URLs. In order to see it more intuitively, we write it with a class.
We use requests to make requests. The code of this website is GBK (how do you see the code of the website?).
Open a website and right-click to check the tag in the header. Take this website as an example, you can see charset = “gb312”.
This GB2312 is the code. There are two common coding methods (utf_8, GBK).


We can verify whether the request has really arrived. Using print (HTML) to see this result (a complete HTML page) indicates that the request is successful.

We redefine this method (parse our web page code).
We use regular expressions to parse the data. We can right-click to check that the website we want is in the table

So we can find the table first, layer by layer. We can refer to the following figure.

Regular expressions are (. *?) Inside is what you want, “. *?” You can omit the label and get to the layer you want. For loop traversal to get each URL. Click these URLs. We need to make a request for the secondary page and parse it.
Because some of the links on the web site are empty, this will lead to the link mismatch of movie download. Therefore, we need to make a judgment. If the length of the download link is greater than 0, it will be displayed as usual, otherwise it will be given a null value, so that it will not be wrong. Finally, this result is returned, as shown in the figure below.

Click the second level page, as shown in the figure, right-click the download link, as shown in the figure below:


We use regular expression analysis to get our download link address, as shown in the following figure:

It doesn’t look very beautiful. Let’s deal with the link, as shown in the figure below:

The results are as follows:

Finally, we save the data in a dictionary with download links and movie names:

Finally, we optimize the requested code, which is a little repetitive;
Use a value to save the content describing the request header. After the request, we can only call this method to make the request, as shown in the following figure:

After the program runs, you can see the effect diagram, as shown in the following figure:

Click the blue link to download (to download Xunlei, Xunlei download is faster)
Can you see more intuitively that you want a movie? Click to download!
[v. summary]
1. Based on Python web crawler technology, this paper provides a more intuitive way to watch your favorite movies and download them conveniently.
2. It is not recommended to grab too much, which is easy to load the server.