Give a few simple examples to better understand the working principle of scratch

Time:2021-1-22

image

Description: people who know about crawlers may know that in crawlers, requests are easy to get started, even if there is no basic Xiaobai. After learning for a few days, you can easily request the website, but it is relatively difficult to learn about scratch. This article can list a few simple examples to explain the working principle of scratch. After understanding the working principle of scratch, it will be much easier to learn

Applicable: This article is suitable for the students who have a little bit of reptile foundation but just contact or want to learn the science

The framework of science is as follows

Scratch is a fast and high-level framework for screen capture and Web Capture developed by python, which is used to capture web sites and extract structured data from pages. Scrapy is widely used in data mining, monitoring and automated testing.

The structure of the framework is as follows

Sketch is a 5 + 2 structure, as shown in the figure below

5
		1. Spiders
		2. Engine
		3. Downloader
		4. Scheduler
		5. Item pipeline	

	2
		1. Downloader Middleware
		2. Spider Middleware

[image upload failed… (image-946ae8-1545285941079)]


Next, let’s list a few examples to understand the principle of scratch conveniently

Let’s talk about reptiles first. For a reptile, as a whole, it is divided into three parts:

  • request

    It is the request website, which is divided into get and post

  • analysis

    It is to parse the response returned by the website, that is, to further process the response

  • storage

    It is the operation of storing the processed information to a file or database

image
There are three parts in the framework of scraping. Here are four scenario settings that you will encounter at the beginning of learning scraping. After each setting, we give a simple explanation of its working principle. The reason why we don’t write too much detail is to facilitate your understanding and give you a general outline of the framework of scraping in your brain

Setting one:
		Initial URL: 1
		Parse: no
		Store data: no

	(1) The spider passes the initial URL to the scheduler through the engine to form a scheduling queue (1 requests)
	(2) The scheduler schedules the requests to the downloader through the engine to download the data to form the original data

Setting 2:
		Initial URL: 1
		Yes or no
		Store data: no

	(1) The spider passes the initial URL to the scheduler through the engine to form a scheduling queue (1 requests)
	(2) The scheduler schedules the requests to the downloader through the engine to download the data to form the original data
	(3) The original data is passed to the spider through the engine for parsing

Setting three:
		Initial URL: 1
		Yes or no
		Store data: Yes

	(1) The spider passes the initial URL to the scheduler through the engine to form a scheduling queue (multiple requests)
	(2) The scheduler schedules the first requests to the downloader through the engine for data download to form the original data
	(3) The original data is passed to the spider through the engine for parsing
	(4) The parsed data is transferred to the item pipeline through the engine for data storage

Setting 4:
		Initial URL: multiple
		Yes or no
		Store data: Yes

	(1) The spider passes the initial URL to the scheduler through the engine to form a scheduling queue (multiple requests)
	(2) The scheduler schedules the first requests to the downloader through the engine for data download to form the original data
	(3) The original data is passed to the spider through the engine for parsing
	(4) The parsed data is transferred to the item pipeline through the engine for data storage
	(5) The scheduler schedules the next request to the downloader through the engine to download the data to form the original data. Repeat steps (2) to (4) until there are no more requests in the scheduler

This article is just a simple explanation of the principle of the framework of science. If you want to master the framework of science, you need to learn more

Follow me for more
Note: please indicate the source of the reprint, thank you_

Recommended Today

JS function

1. Ordinary function Grammar: Function function name (){ Statement block } 2. Functions with parameters Grammar: Function function name (parameter list){ Statement block } 3. Function with return value Grammar: Function function name (parameter list){ Statement block; Return value; } Allow a variable to accept the return value after calling the function Var variable name […]