Implementation of asynchronous IO in tornado source code analysis



This article will try to take you step by step through an asynchronous operation, so as to understand how tornado realizes asynchronous io
In fact, this paper is the practice and review of the previous paper
The main idea is to focus on the implementation of asynchronous IO, so some exception handling in the code will be ignored. There are many words, just make it

Next, only part of the source code will be posted to help understand. I hope that the patient students can open the source code of tornado and track it together

AsyncHTTPClient :

Asynchttpclient inherits configurable. From new, we can see that it is a singleton mode  
According to configurable new and asynchttpclient configurable base and configurable default,
After instantiation, it must be an instance of simpleasynchttpclient


def fetch(self, request, callback=None, raise_error=True, **kwargs):

        if self._closed:
            raise RuntimeError("fetch() called on closed AsyncHTTPClient")
        if not isinstance(request, HTTPRequest):
            request = HTTPRequest(url=request, **kwargs)
        # We may modify this (to add Host, Accept-Encoding, etc),
        # so make sure we don't modify the caller's object.  This is also
        # where normal dicts get converted to HTTPHeaders objects.
        request.headers = httputil.HTTPHeaders(request.headers)
        request = _RequestProxy(request, self.defaults)
        future = TracebackFuture()
        if callback is not None:
            callback = stack_context.wrap(callback)

            def handle_future(future):
                exc = future.exception()
                if isinstance(exc, HTTPError) and exc.response is not None:
                    response = exc.response
                elif exc is not None:
                    response = HTTPResponse(
                        request, 599, error=exc,
                        request_time=time.time() - request.start_time)
                    response = future.result()
                self.io_loop.add_callback(callback, response)
        ##Fetch? Impl with handle? Response
        def handle_response(response):
            if raise_error and response.error:
        self.fetch_impl(request, handle_response)
        return future
Fetch_impl is invoked in fetch. One of the parameters in fetch_impl is callback, while callback in code contains set_result of future.
Therefore, when callback is invoked, the external yield operation will be activated, the program will call the callback in ioloop, and then return to the yield of the original function.
And the original function returns the future object of this qeust to add another callback outside the function


def _connection_class(self):
        return _HTTPConnection

def _handle_request(self, request, release_callback, final_callback):
            self.io_loop, self, request, release_callback,
            final_callback, self.max_buffer_size, self.tcp_client,
            self.max_header_size, self.max_body_size)
Before return, continue to see how the fetch ﹣ impl is handled internally. It is speculated that it will continue to process network requests,
It is certain that the network request will be sent to the epoll part of ioloop for processing, and the processed handler will be set before returning
 Next, we will continue to analyze the internal mechanism of fetch impl if the network request is set
 Check the implementation code of fetch ﹣ impl and create the tcpclient object in the instantiation. This must be the key

According to the previous analysis, simpleasynchttpclient is a singleton mode, so how does it handle various HTTP requests?
Looking at the code, we know that he stores the request and callback in self.queue,
Every time when fetching_impl, pop up one by one to process, so that n requests can be processed

Step by step, trace to "handle" request and find that it is finally in the instantiation of "HTTP connection"
The instantiated parameter has the previous callback containing future
In this way, you can ensure that the yield operation can return to its original place. OK, keep going


class _HTTPConnection(httputil.HTTPMessageDelegate):

    def __init__(self, io_loop, client, request, release_callback,
                 final_callback, max_buffer_size, tcp_client,
                 max_header_size, max_body_size):
        self.start_time = io_loop.time()
        self.io_loop = io_loop
        self.client = client
        self.request = request
        self.release_callback = release_callback
        self.final_callback = final_callback
        self.max_buffer_size = max_buffer_size
        self.tcp_client = tcp_client
        self.max_header_size = max_header_size
        self.max_body_size = max_body_size
        self.code = None
        self.headers = None
        self.chunks = []
        self._decompressor = None
        # Timeout handle returned by IOLoop.add_timeout
        self._timeout = None
        self._sockaddr = None
        with stack_context.ExceptionStackContext(self._handle_exception):
            self.parsed = urlparse.urlsplit(_unicode(self.request.url))
            if self.parsed.scheme not in ("http", "https"):
                raise ValueError("Unsupported url scheme: %s" %
            # urlsplit results have hostname and port results, but they
            # didn't support ipv6 literals until python 2.7.
            netloc = self.parsed.netloc
            if "@" in netloc:
                userpass, _, netloc = netloc.rpartition("@")
            host, port = httputil.split_host_and_port(netloc)
            if port is None:
                port = 443 if self.parsed.scheme == "https" else 80
            if re.match(r'^\[.*\]$', host):
                # raw ipv6 addresses in urls are enclosed in brackets
                host = host[1:-1]
            self.parsed_hostname = host  # save final host for _on_connect

            if request.allow_ipv6 is False:
                af = socket.AF_INET
                af = socket.AF_UNSPEC

            ssl_options = self._get_ssl_options(self.parsed.scheme)

            timeout = min(self.request.connect_timeout, self.request.request_timeout)
            if timeout:
                self._timeout = self.io_loop.add_timeout(
                    self.start_time + timeout,
            self.tcp_client.connect(host, port, af=af,
_There is a pile of member variables in the instantiation of HTTPCONNECTION, which is a bit dizzy,
No matter how much, focus on our callback and tcpclient

Looking down one line, it's the initialization of host and port. HTTP and HTTPS are different. Of course, we have to deal with them,

At last, it's tcpclient.connect. From the connect parameter, we can see that callback = self. \,
It should be an important method to get rid of the string processing and find self.connection.write'headers (start line, self.request.headers),
This should be the operation to send the HTTP header. It's a network request, so this is the operation to send the HTTP header after processing the URL connect

Let's go back and see how to connect, because it's the key to asynchrony. If you understand this, the rest is the same


Go to tcpclient's code to see its instantiation and connect operation. It seems that there is still a long way to go
 The code of tcpclient instantiation is very short. There is a resolver object, regardless of


    def connect(self, host, port, af=socket.AF_UNSPEC, ssl_options=None,
        """Connect to the given host and port.

        Asynchronously returns an `.IOStream` (or `.SSLIOStream` if
        ``ssl_options`` is not None).
        addrinfo = yield self.resolver.resolve(host, port, af)
        connector = _Connector(
            addrinfo, self.io_loop,
            functools.partial(self._create_stream, max_buffer_size))
        af, addr, stream = yield connector.start()
        # TODO: For better performance we could cache the (af, addr)
        # information here and re-use it on subsequent connections to
        # the same host. (
        if ssl_options is not None:
            stream = yield stream.start_tls(False, ssl_options=ssl_options,
        raise gen.Return(stream)
Go to the connect method, find the coroutine decorator, and set callback = self. \,
So when the future of coroutine is solved, self. \,
You can also see that the parameter of "on" is stream, that is, Gen. return (stream) is passed 
Because in the code of gen.coroutine implementation,
 After sending value, the code continues to go to gen.return,
It will go to set result in gen.coroutine.)

The first yield is self.resolver.resolve on the right, addrinfo on the left, and address information,
This asynchronous operation processes the address information of the URL. Here, tornado uses the blocking implementation by default. For the time being, don't look at it first,
It will be added in a new chapter later. The main content is the content of run on executor decorator,
In fact, it's returned synchronously here, because the default code is blockingresolver. Look directly at the next yield


    def __init__(self, addrinfo, io_loop, connect):
        self.io_loop = io_loop
        self.connect = connect

        self.future = Future()
        self.timeout = None
        self.last_error = None
        self.remaining = len(addrinfo)
        self.primary_addrs, self.secondary_addrs = self.split(addrinfo)
_The connector instantiation parameter has a callback, which is the "create" stream of this class,
And set self.connect to the incoming callback 
So self.connect is tcpclient. ,
Member variables have a future instance. We need to pay close attention to future and callback

After instantiation, the start method is invoked, and try_connect, set_timout is called inside start.
According to the function name, it is the connect operation and set timeout operation. Finally, it returns the future created at the time of instantiation


def start(self, timeout=_INITIAL_CONNECT_TIMEOUT):
        return self.future

    def try_connect(self, addrs):
            af, addr = next(addrs)
        except StopIteration:
            # We've reached the end of our queue, but the other queue
            # might still be working.  Send a final error on the future
            # only when both queues are finished.
            if self.remaining == 0 and not self.future.done():
                self.future.set_exception(self.last_error or
                                          IOError("connection failed"))
        future = self.connect(af, addr)
                                                   addrs, af, addr))
Future = self.connect (AF, addr), which executes tcpclient. ﹣ create ﹣ stream method,
Return to future, and set future's callback = on "connect" done


    def _create_stream(self, max_buffer_size, af, addr):
        # Always connect in plaintext; we'll convert to ssl if necessary
        # after one connection has completed.
        stream = IOStream(socket.socket(af),
        return stream.connect(addr)
Instantiate iostream, execute and return stream.connect. The future returned by stream.connect is the future in try connect,
So, go inside and see how stream. Connect "solves" the future



def connect(self, address, callback=None, server_hostname=None):
        self._connecting = True
        if callback is not None:
            self._connect_callback = stack_context.wrap(callback)
            future = None
            future = self._connect_future = TracebackFuture()
        except socket.error as e:

            if (errno_from_exception(e) not in _ERRNO_INPROGRESS and
                    errno_from_exception(e) not in _ERRNO_WOULDBLOCK):
                if future is None:
                    gen_log.warning("Connect error on fd %s: %s",
                                    self.socket.fileno(), e)
                return future
        return future
Self. _connecting = true set this instance is in the process of connecting and set to false after connecting 
If no callback is passed in, the future object will be generated. The future returned just now will be recorded in the member variable self. Connect
Then perform the connect operation of socket, because socket is set to non blocking,
So here will return immediately without blocking. When the connection is successful, the buffer can be written. When the connection fails, the buffer can be read and written. This is the basic knowledge, details Baidu
Then call self._add_io_state and return to future.


    def _add_io_state(self, state):

        if self.closed():
            # connection has been closed, so there can be no future events
        if self._state is None:
            self._state = ioloop.IOLoop.ERROR | state
            with stack_context.NullContext():
                    self.fileno(), self._handle_events, self._state)
        elif not self._state & state:
            self._state = self._state | state
            self.io_loop.update_handler(self.fileno(), self._state)
At last, we need epoll!!! According to the instantiated code, we know that self. \,
I will follow the steps of'loop.add'handler. According to my previous [article] [2], I will register the current FD, the current instance's'handle'events, and write and error operations to epoll
Then!!!!! Finally, I have finished the yield process!!!!!!


Please be sure to find out how the future is delivered and what is the operation of each future managed callback
  _In HTTP connection, tcpclient. Connect is a future, callback = self. \
 He will be added to the ioloop execution at raise Gen. return (stream) 
  Tcpclient.connect a future of connector.start inside, 
  Callback is on connect done, which will be added to the ioloop execution when the poll detects the write event


    def start(self):
        if self._running:
            raise RuntimeError("IOLoop is already running")
        if self._stopped:
            self._stopped = False
        old_current = getattr(IOLoop._current, "instance", None)
        IOLoop._current.instance = self
        self._thread_ident = thread.get_ident()
        self._running = True

        old_wakeup_fd = None
        if hasattr(signal, 'set_wakeup_fd') and == 'posix':

                old_wakeup_fd = signal.set_wakeup_fd(self._waker.write_fileno())
                if old_wakeup_fd != -1:
                    old_wakeup_fd = None
            except ValueError:
                old_wakeup_fd = None

            while True:
                with self._callback_lock:
                    callbacks = self._callbacks
                    self._callbacks = []

                due_timeouts = []

                if self._timeouts:
                    now = self.time()
                    while self._timeouts:
                        if self._timeouts[0].callback is None:

                            self._cancellations -= 1
                        elif self._timeouts[0].deadline <= now:
                    if (self._cancellations > 512
                            and self._cancellations > (len(self._timeouts) >> 1)):

                        self._cancellations = 0
                        self._timeouts = [x for x in self._timeouts
                                          if x.callback is not None]

                for callback in callbacks:
                for timeout in due_timeouts:
                    if timeout.callback is not None:
                callbacks = callback = due_timeouts = timeout = None

                if self._callbacks:
                    poll_timeout = 0.0
                elif self._timeouts:

                    poll_timeout = self._timeouts[0].deadline - self.time()
                    poll_timeout = max(0, min(poll_timeout, _POLL_TIMEOUT))

                    poll_timeout = _POLL_TIMEOUT

                if not self._running:

                if self._blocking_signal_threshold is not None:

                    signal.setitimer(signal.ITIMER_REAL, 0, 0)

                    event_pairs = self._impl.poll(poll_timeout)
                except Exception as e:

                    if errno_from_exception(e) == errno.EINTR:

                if self._blocking_signal_threshold is not None:
                                     self._blocking_signal_threshold, 0)

                while self._events:
                    fd, events = self._events.popitem()
                        fd_obj, handler_func = self._handlers[fd]
                        handler_func(fd_obj, events)
                    except (OSError, IOError) as e:
                        if errno_from_exception(e) == errno.EPIPE:
                            # Happens when the client closes the connection
                    except Exception:
                fd_obj = handler_func = None

            # reset the stopped flag so another start/stop pair can be issued
            self._stopped = False
            if self._blocking_signal_threshold is not None:
                signal.setitimer(signal.ITIMER_REAL, 0, 0)
            IOLoop._current.instance = old_current
            if old_wakeup_fd is not None:
Then tornado finally returns to the ioloop code (tear rush)!! when the connection is successful, the FD buffer is writable,
 Epoll receives FD's write operation notification ~ it enters into epoll's loop for processing. Then! It's back to the registered "handle" events!
Note that this "handle" event is the "handle" event in the instance of iostream. It has all the information we just processed~

Next, look at the code of "handle" events to see if it solves the problem of future


    def _handle_events(self, fd, events):
        if self.closed():
            gen_log.warning("Got events for closed stream %s", fd)
            if self._connecting:
                # Most IOLoops will report a write failed connect
                # with the WRITE event, but SelectIOLoop reports a
                # READ as well so we must check for connecting before
                # either.
            if self.closed():
            if events & self.io_loop.READ:
            if self.closed():
            if events & self.io_loop.WRITE:
            if self.closed():
            if events & self.io_loop.ERROR:
                self.error = self.get_fd_error()
                # We may have queued up a user callback in _handle_read or
                # _handle_write, so don't close the IOStream until those
                # callbacks have had a chance to run.
            state = self.io_loop.ERROR
            if self.reading():
                state |= self.io_loop.READ
            if self.writing():
                state |= self.io_loop.WRITE
            if state == self.io_loop.ERROR and self._read_buffer_size == 0:
                # If the connection is idle, listen for reads too so
                # we can tell if the connection is closed.  If there is
                # data in the read buffer we won't run the close callback
                # yet anyway, so we don't need to listen in this case.
                state |= self.io_loop.READ
            if state != self._state:
                assert self._state is not None, \
                    "shouldn't happen: _handle_events without self._state"
                self._state = state
                self.io_loop.update_handler(self.fileno(), self._state)
        except UnsatisfiableReadError as e:
  "Unsatisfiable read, closing connection: %s" % e)
        except Exception:
            gen_log.error("Uncaught exception, closing connection.",

    def _handle_connect(self):
        err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
        if err != 0:
            self.error = socket.error(err, os.strerror(err))
            # IOLoop implementations may vary: some of them return
            # an error state before the socket becomes writable, so
            # in that case a connection failure would be handled by the
            # error path in _handle_events instead of here.
            if self._connect_future is None:
                gen_log.warning("Connect error on fd %s: %s",
                                self.socket.fileno(), errno.errorcode[err])
        if self._connect_callback is not None:
            callback = self._connect_callback
            self._connect_callback = None
        if self._connect_future is not None:
            future = self._connect_future
            self._connect_future = None
        self._connecting = False
To judge whether the connection is in progress, of course, I also stressed just now,
Then enter "handle" and "handle" connect. First, judge whether the connect is successful,
Success is to set the result of connect future, set result (self), and set self (iostream)!   
Then the "connect" future callbacks will be digested by ioloop in the next cycle!!

Step by step, we can see that this future is exactly the future returned from the right side of the yiled operation,
So the callback set just now by "connector. Try" connect, on "connect" will be executed in the callback of ioloop 
According to the source code of coroutine mentioned in the previous [article] [3], there is also a callback of in this future~
So, in run, send vaule to Gen 
Finally!! the program returned to the yield place just now!!!!!

This is how tornado implements asynchronous IO

I don't think it's realistic to talk about the whole operation all the time. Let's follow the rest by ourselves. The reason is similar to this process
 On the right side of the yield operation, it must return a future (the old version seems to be yieldpoint, because I haven't seen the old version, so I'm not sure),
Then, before returning to future, set up the handler of FD and other parsing work, and wait for epoll to detect the concerned IO event,
Solve the future in the IO handler, and then return to yield ~ the core is the three parts of IO loop, future, gen.coroutine 
Cooperate with each other to complete asynchronous operation. After tracking and digesting several times, you can write the extension of tornado 

I wish you all the best in martial arts

Recommended Today

Configure Apache to support PHP in the Apache main configuration file httpd.conf Include custom profile in

In Apache’s main configuration file / conf/ http.conf Add at the bottom Include “D:workspace_phpapache-php.conf” The file path can be any In D: workspace_ Create under PHP file apache- php.conf file Its specific content is [html] view plain copy PHP-Module setup LoadFile “D:/xampp/php/php5ts.dll” LoadModule php5_module “D:/xampp/php/php5apache2_2.dll” <FilesMatch “.php$”> SetHandler application/x-httpd-php </FilesMatch> <FilesMatch “.phps$”> SetHandler application/x-httpd-php-source </FilesMatch> […]