Build forum from scratch (2): Web Server Gateway Interface

Time:2020-3-25

In building forum from scratch (1): Web server and web framework, we have made clear the concepts of web server, web application and web framework. For Python, more and more web frameworks are available, which not only gives us more choices, but also limits our choice of web server. Java also has many web frameworks. Because of the existence of servlet API, any application written by java web framework can run on any web server.

The python community also needs such a set of API to adapt to web servers and applications. This set of API is WSGI (Python Web Server Gateway Interface), which is detailed in PEP 3333. In short, WSGI is the bridge between web server and web application. On the one hand, the original HTTP data is obtained from web server, processed into a unified format and handed over to web application, on the other hand, the business logic is processed from application / framework side, and the response content is generated and handed over to the server.

The detailed process of web server and framework coupling through WSGI is shown in the following figure:

Build forum from scratch (2): Web Server Gateway Interface

The specific explanation is as follows:

  • The application (network framework) provides a callable object named application (the WSGI protocol does not specify how to implement this object).

  • Every time the server receives a request from the HTTP client, it calls the callable object application, passing a dictionary named environ as a parameter, and a callable object named start’response.

  • The framework / application generates the HTTP status code and HTTP response header, and then passes them to start “response”, waiting for the server to save. In addition, the framework / application will return the body of the response.

  • The server combines the status code, response header and response body into an HTTP response and returns it to the client (this step does not belong to WSGI protocol).

Let’s see how WSGI adapts from the server side and the application side.

Server side

We know that each HTTP request sent by the client (usually the browser) consists of request line, message header and request body, which contains the relevant details of this request. For example:

  • Method: indicates the method executed on the resource identified by the request URI, including get, post, etc

  • User agent: allows the client to tell the server its operating system, browser and other properties;

After the server receives HTTP requests from the client, the WSGI interface must unify these request fields for convenient transmission to the application server interface (in fact, to the framework). As early as CGI (Common Gateway Interface), the specific data that the web server passes to the application has been specified in detail. These data are called CGI environment variables. WSGI follows the contents of CGI environment variables and requires the web server to create a dictionary to store these environment variables (usually named asenviron) In addition to the variables defined by CGI, environ must also save some variables defined by WSGI. In addition, it can also save some environment variables of the client system. You can refer to environ variables to see which variables are specific.

Then, the WSGI interface must hand over the environ to the application program for processing. Here, WSGI specifies that the application program provides a callable object application, and then the server calls the application to get the return value as the HTTP response body. When the server calls the application, it needs to provide two variables, one is the variable dictionary environ mentioned earlier, the other is the callable object start u response, which generates the status code and response header, so we get a complete HTTP response. The web server returns the response to the client, one completeHTTP request responseThe process is complete.

Wsgiref analysis

Python has a built-in web server that implements the WSGI interface. In the module wsgiref, it is a reference implementation of the WSGI server written in pure python. Let’s briefly analyze its implementation. Let’s start a web server with the following code:

# Instantiate the server
httpd = make_server(
    'localhost',    # The host name
    8051,           # A port number where to wait for the request
    application     # The application object name, in this case a function
)

# Wait for a single request, serve it and quit
httpd.handle_request()

Then we receive a request from the Web server, generate environ, and then call application to process the request line to analyze the calling process of the source code, which is simplified as shown in the following figure:

Build forum from scratch (2): Web Server Gateway Interface

There are three main classes, wsgiserver, wsgirequesthandler and serverhandle. Wsgiserver is a web server class, which can provide server_address (IP: port) and wsgirequesthandler classes to initialize and obtain a server object. The object listens to the port of response. After receiving the HTTP request, it creates an instance of the RequestHandler class through finish_request. During the initialization of the instance, a Handle class instance is generated, then its run (application) function is invoked, and the application object provided by the application program is invoked in the function to generate the response.

The inheritance relationship of these three classes is shown in the following figure:

Build forum from scratch (2): Web Server Gateway Interface

Tcpserver uses socket to complete TCP communication, and HTTP server is used for HTTP level processing. Similarly, the streamrequesthandler is used to process the stream socket, and the basehttprequesthandler is used to process the content at the HTTP level. This part has little to do with the WSGI interface, but more to do with the specific implementation of the web server, which can be ignored.

Microserver instance

If the above wsgiref is too complex, let’s implement a small web server, so that we can understand the implementation of the WSGI interface on the web server side. The code is extracted from the do it yourself network server (2), which is placed on gist. The main structure is as follows:

class WSGIServer(object):
    #Socket parameters
    address_family, socket_type = socket.AF_INET, socket.SOCK_STREAM
    request_queue_size = 1

    def __init__(self, server_address):
        #Initialization of TCP server: create socket, bind address and listen port
        #Get server address, port

    def set_app(self, application):
        #Get the application provided by the framework
        self.application = application

    def serve_forever(self):
        #Process TCP connection: get request content, call processing function

    def handle_request(self):
        #Parse HTTP request, obtain environ, process request content and return HTTP response result
        env = self.get_environ()
        result = self.application(env, self.start_response)
        self.finish_response(result)

    def parse_request(self, text):
        #Parsing HTTP requests
        
    def get_environ(self):
        #Analyze the environ parameter. This is just an example. There are many parameters in the actual situation.
        env['wsgi.url_scheme']   = 'http'
        ...
        env['REQUEST_METHOD']    =  self.request_method    # GET
        ...
        return env

    def start_response(self, status, response_headers, exc_info=None):
        #Add response header, status code
        self.headers_set = [status, response_headers + server_headers]

    def finish_response(self, result):
        #Return HTTP response information

SERVER_ADDRESS = (HOST, PORT) = '', 8888

#Create a server instance
def make_server(server_address, application):
    server = WSGIServer(server_address)
    server.set_app(application)
    return server

At present, there are many mature web servers supporting WSGI, and gunicorn is quite a good one. It was born from the unicorn of ruby community and successfully transplanted to Python to become a WSGI HTTP server. It has the following advantages:

  • Easy to configure

  • Multiple worker processes can be managed automatically

  • Select different background extension interfaces (sync, gevent, tornado, etc.)

Application side (framework)

Compared with the server side, the application side (also can be considered as the framework) is much simpler to do. It only needs to provide a callable object (usually named application), which receives the two parameters environ and start’u response passed by the server side. The callable object here can be not only a function, but also a class (the second example below) or an owner__call__An example of the method, as long asThe two parameters mentioned above can be accepted, and the return value can be iterated by the server

What application needs to do is to carry out certain business processing according to the information about HTTP request provided in environ and return an iterative object. The server side obtains the body of HTTP response by iterating this object. If there is no response body, you can return none.

At the same time, the application will also call the start “response” provided by the server to generate the HTTP response status code and response header. The prototype is as follows:

def start_response(self, status, headers,exc_info=None):

Application needs to provide status: a string representing the HTTP response status string, and response_headers: a list containing tuples in the form of: (header_name, header_value) to represent the headers of HTTP response. At the same time, exc ﹣ info is optional. When an error occurs, the server needs to return the information to the browser.

So far, we can implement a simple application as follows:

def simple_app(environ, start_response):
    """Simplest possible application function"""
    HELLO_WORLD = "Hello world!\n"
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [HELLO_WORLD]

Or it can be implemented as follows.

class AppClass:
    """Produce the same output, but using a class"""

    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        ...
        HELLO_WORLD = "Hello world!\n"
        yield HELLO_WORLD

Pay attention here.AppClassClass itself is application. It returns an instance object by calling (instantiating) it with environ and start’response. The instance object itself is iterative and meets the requirements of WSGI for application.

If you want to use an object of the appclass class as an application, you must add a__call__Method, take the environ and start’response as parameters, and return the iteratable object, as shown below:

class AppClass:
    """Produce the same output, but using an object"""
    def __call__(self, environ, start_response):
        ...

This part involves some advanced features of python, such as yield and magic method, which can be understood by referring to the key points of Python language summarized by me.

WSGI in flask

Flask is a lightweight Python web framework that meets the specification requirements of WSGI. Its original version, with more than 600 lines, is relatively easy to understand. Let’s take a look at the WSGI interface in its original version.

def wsgi_app(self, environ, start_response):
    """The actual WSGI application.

    This is not implemented in `__call__` so that middlewares can be applied:
        app.wsgi_app = MyMiddleware(app.wsgi_app)
    """
    with self.request_context(environ):
        rv = self.preprocess_request()
        if rv is None:
            rv = self.dispatch_request()
        response = self.make_response(rv)
        response = self.process_response(response)
        return response(environ, start_response)


def __call__(self, environ, start_response):
    """Shortcut for :attr:`wsgi_app`"""
    return self.wsgi_app(environ, start_response)       

The WSGI app here implements what we call application function. RV is the encapsulation of request, and response is the specific function used by the framework to process business logic. There is not too much explanation for the source code of flash here. You can go to GitHub to download those interested, and then check to the original version.

middleware

As mentioned in the comments of the WSGI app function in the previous flash code, it is not directly in the__call__In order to use themiddleware。 So why use middleware? What is middleware?

Review the previous application / server interface. For an HTTP request, the server will always call an application to process and return the result of application processing. This is enough for general scenarios, but it is not perfect. Consider the following application scenarios:

  • For different requests (such as different URLs), the server needs to call different applications, so how to choose which one to call;

  • In order to do load balancing or remote processing, applications running on other hosts on the network should be used for processing;

  • The content returned by the application needs to be processed before it can be used as an HTTP response;

The common point of the above scenarios is that some necessary operations are not suitable either on the server side or on the application (framework) side. For the application side, these operations should be done by the server side. For the server side, these operations should be done by the application side. To deal with this situation, themiddleware

Middleware is like a bridge between the application end and the service end to communicate with both sides. For the server side, middleware behaves like the application side, and for the application side, it behaves like the server side. As shown in the figure below:

Build forum from scratch (2): Web Server Gateway Interface

Implementation of Middleware

The flask framework uses Middleware in the initialization code of the flask class:

self.wsgi_app = SharedDataMiddleware(self.wsgi_app, { self.static_path: target })

The function here is the same as the decorator in Python, which is to execute some content in shareddatamidleware before and after the self.wsgi app. What middleware does is similar to what decorators do in Python. The shared data middleware middleware is provided by the Werkzeug library to support site hosting of static content. In addition, there is dispatcher middleware, which is used to support calling different applications according to different requests. In this way, the problems in the previous scenarios 1 and 2 can be solved.

Let’s take a look at the implementation of dispatcher middleware:

class DispatcherMiddleware(object):
    """Allows one to mount middlewares or applications in a WSGI application.
    This is useful if you want to combine multiple WSGI applications::
        app = DispatcherMiddleware(app, {
            '/app2':        app2,
            '/app3':        app3
        })
    """

    def __init__(self, app, mounts=None):
        self.app = app
        self.mounts = mounts or {}

    def __call__(self, environ, start_response):
        script = environ.get('PATH_INFO', '')
        path_info = ''
        while '/' in script:
            if script in self.mounts:
                app = self.mounts[script]
                break
            script, last_item = script.rsplit('/', 1)
            path_info = '/%s%s' % (last_item, path_info)
        else:
            app = self.mounts.get(script, self.app)
        original_script_name = environ.get('SCRIPT_NAME', '')
        environ['SCRIPT_NAME'] = original_script_name + script
        environ['PATH_INFO'] = path_info
        return app(environ, start_response)

When initializing middleware, you need to provide a mounts dictionary to specify the mapping relationship between different URL paths and applications. In this way, for a request, the middleware checks its path, and then selects the appropriate application for processing.

At the end of the principle part of WSGI, we will introduce the simple use of the flash framework in the next part.

This article was published on a personal blog by selfboot, using the signature – non-commercial use – the same way to share the 3.0 License Agreement in mainland China.
For non-commercial reprint, please indicate the author and source. Please contact the author for business reprint.
The title of this paper is: build a forum from scratch (2): Web Server Gateway Interface
The link of this article is: http://selfboot.cn/2016/08/07

More reading

WSGI Content
WSGI Tutorial by Clodoaldo Neto
WSGI Explorations in Python
Develop network server by yourself (2)
What is WSGI?
Write a WSGI server to run Django, tornado and other framework applications
PEP 3333 — Python Web Server Gateway Interface v1.0.1
What is a “callable” in Python?