Python combined with multithreading crawling hero alliance skin (principle analysis)

Time:2021-11-29

1. What is multithreading?

Multithreading is to complete multiple tasks synchronously, not to improve operation efficiency, but to improve resource utilization efficiency to improve the efficiency of the system. Threads are implemented when multiple tasks need to be completed at the same time.

Why use multithreading
Threads are independent and concurrent execution streams in programs. Compared with separated processes, threads in a process are less isolated. They share memory, file handles, and the state that other processes should have.

Because the division scale of threads is smaller than that of processes, the concurrency of multithreaded programs is high. The process has independent memory units during execution, and multiple threads share memory, which greatly improves the running efficiency of the program.

Threads have higher performance than processes because threads in the same process have something in common. Multiple threads share the virtual space of the same process. Thread sharing environment includes process code segments, process public data, etc. using these shared data, it is easy to realize communication between threads.

When the operating system creates a process, it must allocate independent memory space for the process and allocate a large number of related resources, but it is much simpler to create a thread. Therefore, using multithreading to achieve concurrency is much better than using multiple processes.

To sum up, using multithreaded programming has the following advantages:

Memory cannot be shared between processes, but it is very easy to share memory between threads.

When the operating system creates a process, it needs to reallocate system resources for the process, but the cost of creating a thread is much smaller. Therefore, using multithreading to implement multitask concurrent execution is more efficient than using multiple processes.

Python language has built-in multi-threaded function support, rather than simply as the scheduling mode of the underlying operating system, which simplifies Python’s multi-threaded programming.

2. Principle

Multithreading is a concurrent execution mechanism.
Principle of concurrent execution mechanism: simply put, a processor is divided into several short time slices, and each time slice executes and processes each application in turn. Because a time slice is very short, it is like a processor serving itself alone compared with an application, so as to achieve the effect of multiple applications at the same time.
Multithreading is to apply the principle of this concurrent execution mechanism in the operating system to a program, divide a program into several subtasks, multiple subtasks execute concurrently, and each task is a thread. This is a multithreaded program.

Python combined with multithreading crawling hero alliance skin (principle analysis)

3. Advantages

1. Using threads, you can put tasks in programs that occupy a long time in the background.
2. The user interface can be more attractive. For example, if the user clicks a button to trigger the processing of some events, a progress bar can pop up to display the processing progress.
3. The program may run faster.
4. In the implementation of some waiting tasks, such as user input, file reading and writing, network sending and receiving data, threads are more useful. In this case, some precious resources can be released, such as memory occupation.
5. Multithreading technology also plays an important role in IOS software development.

4. Disadvantages

1. If there are a large number of threads, it will affect performance because the operating system needs to switch between them.
2. More threads require more memory space.
3. Threads may bring more “bugs” to the program, so be careful.
4. Thread termination needs to consider its impact on program operation.
5. Generally, block model data is shared among multiple threads, so thread deadlock needs to be prevented.

OK, no more nonsense. Let’s fight directly

1. Enter the official website of hero League and click the game data to enter this screen

Python combined with multithreading crawling hero alliance skin (principle analysis)

2. Determine whether the crawled web page is loaded synchronously or asynchronously

1. Right click to open the web page source code
2. Ctrl + F opens the search box
3. Enter the hero’s name in the search box

Python combined with multithreading crawling hero alliance skin (principle analysis)

If there is no search result, it will be loaded asynchronously. 3. Find the hero URL address

Back to the hero page, right-click to open the check.

Python combined with multithreading crawling hero alliance skin (principle analysis)

Find hero in the acquired package_ List.js this package is translated from English to hero list.js file

Python combined with multithreading crawling hero alliance skin (principle analysis)

Click the hero Annie on the web page to get back the address, and then click other heroes to get back the address for comparison.

Annie:

Python combined with multithreading crawling hero alliance skin (principle analysis)

Crazy Warrior:

Python combined with multithreading crawling hero alliance skin (principle analysis)

Click headers to retrieve the resquests URL
Annie: https://game.gtimg.cn/images/lol/act/img/js/hero/1.js
Crazy Warrior: https://game.gtimg.cn/images/lol/act/img/js/hero/2.js

It can be found that the change is the hero’s ID at the end, so we have ideas

1. Send the request for the first time and get back the IDS and names of all heroes
2. For the second request, get the hero skin name, Hero Mobile Phone skin URL and hero computer skin URL
3. Request binary data of mobile phone picture and binary data of computer picture
4. Save the computer version of the hero League picture and the mobile version of the hero League picture
5. Multi process to save data

It’s time to write some code… hey

Make the starting address into a global variable

Python combined with multithreading crawling hero alliance skin (principle analysis)

Watch web pages

Python combined with multithreading crawling hero alliance skin (principle analysis)

1 for the first request, we need to get these two data back. It can be seen that there is JSON data in the preview. Import the JSON path library to extract the data.

Python combined with multithreading crawling hero alliance skin (principle analysis)

2. For the second request, get the hero skin name, Hero Mobile Phone skin URL and hero computer skin URL

Python combined with multithreading crawling hero alliance skin (principle analysis)
Python combined with multithreading crawling hero alliance skin (principle analysis)

3 request to get the binary data of the mobile phone picture, and the binary data of the computer picture uses the try except statement to prevent the error and stop the code operation.

Python combined with multithreading crawling hero alliance skin (principle analysis)

4. Save the computer version of the hero League picture, save the mobile version of the hero League picture, and use the try except statement to prevent an error from stopping the code.

Python combined with multithreading crawling hero alliance skin (principle analysis)

5. Multi process implementation save data import package: import threading
Writing method of threading. Thread (target = self. Function name, args = (used parameters))

Python combined with multithreading crawling hero alliance skin (principle analysis)

Code complete solution:

Python combined with multithreading crawling hero alliance skin (principle analysis)

Python combined with multithreading crawling hero alliance skin (principle analysis)
Python combined with multithreading crawling hero alliance skin (principle analysis)
Python combined with multithreading crawling hero alliance skin (principle analysis)

Finally, send a daemon thread to prevent error reporting template, which you can refer to.

from threading import Thread
from queue import Queue

class Love(object):
def init(self):
#Queue capacity, queue creation, [], {}
self.q = Queue()

def parse_ data(self):

if name == ‘main‘:
love = Love()
love.run()

I wish you all success in learning Python

The above is the details of the method of Python crawling hero alliance skin combined with multithreading. For more information about Python crawling hero alliance skin, please pay attention to other relevant articles of developeppaper!

Recommended Today

On the mutation mechanism of Clickhouse (with source code analysis)

Recently studied a bit of CH code.I found an interesting word, mutation.The word Google has the meaning of mutation, but more relevant articles translate this as “revision”. The previous article analyzed background_ pool_ Size parameter.This parameter is related to the background asynchronous worker pool merge.The asynchronous merge and mutation work in Clickhouse kernel is completed […]