Python crawler teaches you to get 4K ultra clear wallpaper pictures, hand in hand teaches you to climb with me!

Time:2021-5-11

The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time

The following article comes from Tencent cloud Author: Python advanced

(want to learn Python? Python learning exchange group: 1039649593, to meet your needs, information has been uploaded to the group file stream, you can download! There are also a large number of the latest 2020 Python learning materials.)
在这里插入图片描述

/1 Preface/

Want to change a computer wallpaper are particularly distressed, because Baidu found those wallpaper, that is, resolution reached the level of wallpaper. But the quality of the pictures inside, I can’t bear to look directly at it. Some 4K high-definition wallpapers are mostly copyrighted, which makes it very difficult for us to get high-definition pictures.

Wallhaven website is a copyright free 4K wallpaper with rich themes (creativity, photography, characters, animation, painting and vision). Today, we will teach you how to download the original wallhaven 4K pictures in batches.
在这里插入图片描述

/2 project objectives/

Get the corresponding 4K wallpaper, download it in batches and save it in the folder.

/3 libraries and websites involved/

Software: pycharm

Required libraries: requests, lxml, fake_ useragent、time

First list the web address as follows:

https://wallhaven.cc/search?q=id%3A65348&page={}

 

The website city =% E5% B9% BF% E5% B7% 9E refers to the city of Guangzhou, and PN refers to the number of pages.

/4 project analysis/

Slide the mouse to observe the change of the web address on the next page:

https://wallhaven.cc/search?q=id%3A65348&page=1
https://wallhaven.cc/search?q=id%3A65348&page=2
https://wallhaven.cc/search?q=id%3A65348&page=3

 

When sliding to the next page, every additional page will automatically increase by 1, use {} instead of the transformed variable, and then use for loop to traverse the web address to realize multiple web address requests.

/5. Concrete implementation/

1. Define a class to inherit object, define init method to inherit self, and main function to inherit self. Import the required library and URL.

import requests
from lxml import etree
from fake_useragent import UserAgent
import time

class  wallhaven(object):
    def __init__(self):
      self.url = "https://wallhaven.cc/search?q=id%3A65348&page={}"

    def main(self):
        pass

if __name__ == '__main__':
    imageSpider =  wallhaven()
    imageSpider.main()

 

2、fake_ The user agent module generates the user agent randomly.

 ua = UserAgent(verify_ssl=False)
        for i in range(1, 50):
            self.headers = {
                'User-Agent': ua.random,
            }

 

3. For loop to achieve multi URL access.

Startpage = int (input ("start page:))
Endpage = int (input ("end page:))
for page in range(startPage, endPage + 1):
    url = self.url.format(page)

 

4. Send a request to get a response.

'''send request get response' ''
    def get_page(self, url):
        res = requests.get(url=url, headers=self.headers)
        html = res.content.decode("utf-8")
        return html

 

5. Parse the first level page to get the address of the second level page.

def parse_page(self, html):
    parse_html = etree.HTML(html)
    image_src_list = parse_html.xpath('//figure//a/@href')

 

6. Traverse the secondary page URL, request and analyze data. Find the relative picture address.

html1 = self.get_ Page (I) # secondary page request
parse_html1 = etree.HTML(html1)
# print(parse_html1)
filename = parse_html1.xpath('//div[@class="scrollbox"]//img/@src')
# print(filename)

 

7. Get the picture address, request, save.

#Picture address request
for img in filename:
    Dirname = ". / graph /" + img [32:] ''can modify the address where the picture is saved' "
    print(dirname)
    html2 = requests.get(url=img, headers=self.headers).content

    with open(dirname, 'wb') as f:
        f.write(html2)
        Print (% s download succeeded% filename)

 

8. Call method to realize the function.

html = self.get_page(url)
self.parse_page(html)
Optimization: set the delay( Prevent IP from being blocked).

Time. Sleep (1.4) "time delay"

 

/6 effect display/

1. Click the green button to run, and the result will be displayed in the console, as shown in the figure below. Enter start page and end page, enter.

在这里插入图片描述

2. The picture is downloaded successfully and output from the console.

在这里插入图片描述

3. Save in batches.

在这里插入图片描述

4. Verify 4K( Click on the picture to open the properties)

在这里插入图片描述

/7 Summary/

1. It is not recommended to grab too much data, which is easy to load the server. Just try it.
2. Based on Python web crawler, this paper uses crawler library to get wallhaven4k wallpaper.
3. 4K wallpaper download may be a little slow, please wait patiently. If the address of the picture is different, you need to modify the saved address of the picture.
4. You can also find your favorite pictures on the wallhaven website and try to do it yourself according to the operation steps. When you realize it yourself, there will always be all kinds of problems. Don’t be arrogant and humble. Only when you work hard can you understand it more deeply.
5. If this source code of small partners, please reply in the background “4K wallpaper” four words to obtain, feel good, remember to give a star Oh~