Scrapy’s download Middleware

Time:2019-12-4

Download Middleware

brief introduction

Downloader, unable to execute JS code, does not support proxy itself

Download the middleware framework for hooks to enter the request / response processing process of scratch, a lightweight underlying system for global modification of the request and response of scratch

The download Middleware in the scratch framework is a class that implements special methods. The middleware of the scratch system is placed in the downloader? Middleware? Base setting

The user-defined middleware needs to be set in downloader? Middlewares. The setting is a dict, the key is the middleware class path, and the value is the middleware sequence, which is a positive integer 0-1000. The smaller the value, the closer it is to the engine

 

API

Each middleware is a python class that defines one or more of the following methods

Process ﹣ request (request, spider) ﹣, which is called for each request through Middleware

Process ﹣ response (request, response, spider) ﹣, which is called for each response through Middleware

An exception call occurred while processing a request (request, exception, spider)

from_crawler(cls,crawler )

 

Common built-in Middleware

Cookiemiddleware supports cookies, which can be turned on and off by setting cookies “enabled”

Httproxymeddleware HTTP proxy, set by setting the value of request.meta [‘proxy ‘]

Useragent middleware and user agent middleware

For other middleware, please refer to the official document: https://docs.summary.org/en/latest/topics/downloader-middleware.html

 

 

 

Common settings

Priority set

Command line options (highest priority)

Set up per spider

Project settings module

Default settings of each command

Default global settings (low priority)

 

Common item settings

Bot’name project name

Current item handles the maximum number of concurrent events. The default is 100

Maximum concurrent download requests

Maximum concurrent number of single domain name

Concurrent > requests > per > maximum concurrent number of a single IP