Implementation of asynchronous IO in tornado source code analysis

Time:2019-11-8

Preface

This article will try to take you step by step through an asynchronous operation, so as to understand how tornado realizes asynchronous io
In fact, this paper is the practice and review of the previous paper
The main idea is to focus on the implementation of asynchronous IO, so some exception handling in the code will be ignored. There are many words, just make it

Next, only part of the source code will be posted to help understand. I hope that the patient students can open the source code of tornado and track it together

AsyncHTTPClient :

Asynchttpclient inherits configurable. From new, we can see that it is a singleton mode  
According to configurable new and asynchttpclient configurable base and configurable default,
After instantiation, it must be an instance of simpleasynchttpclient

fetch

def fetch(self, request, callback=None, raise_error=True, **kwargs):

        if self._closed:
            raise RuntimeError("fetch() called on closed AsyncHTTPClient")
        if not isinstance(request, HTTPRequest):
            request = HTTPRequest(url=request, **kwargs)
        # We may modify this (to add Host, Accept-Encoding, etc),
        # so make sure we don't modify the caller's object.  This is also
        # where normal dicts get converted to HTTPHeaders objects.
        request.headers = httputil.HTTPHeaders(request.headers)
        request = _RequestProxy(request, self.defaults)
        future = TracebackFuture()
        if callback is not None:
            callback = stack_context.wrap(callback)

            def handle_future(future):
                exc = future.exception()
                if isinstance(exc, HTTPError) and exc.response is not None:
                    response = exc.response
                elif exc is not None:
                    response = HTTPResponse(
                        request, 599, error=exc,
                        request_time=time.time() - request.start_time)
                else:
                    response = future.result()
                self.io_loop.add_callback(callback, response)
            future.add_done_callback(handle_future)
        ##Fetch? Impl with handle? Response
        def handle_response(response):
            if raise_error and response.error:
                future.set_exception(response.error)
            else:
                future.set_result(response)
        self.fetch_impl(request, handle_response)
        return future
Fetch_impl is invoked in fetch. One of the parameters in fetch_impl is callback, while callback in code contains set_result of future.
Therefore, when callback is invoked, the external yield operation will be activated, the program will call the callback in ioloop, and then return to the yield of the original function.
And the original function returns the future object of this qeust to add another callback outside the function

fetch_impl

def _connection_class(self):
        return _HTTPConnection

def _handle_request(self, request, release_callback, final_callback):
        self._connection_class()(
            self.io_loop, self, request, release_callback,
            final_callback, self.max_buffer_size, self.tcp_client,
            self.max_header_size, self.max_body_size)
Before return, continue to see how the fetch ﹣ impl is handled internally. It is speculated that it will continue to process network requests,
It is certain that the network request will be sent to the epoll part of ioloop for processing, and the processed handler will be set before returning
 Next, we will continue to analyze the internal mechanism of fetch impl if the network request is set
 Check the implementation code of fetch ﹣ impl and create the tcpclient object in the instantiation. This must be the key

According to the previous analysis, simpleasynchttpclient is a singleton mode, so how does it handle various HTTP requests?
Looking at the code, we know that he stores the request and callback in self.queue,
Every time when fetching_impl, pop up one by one to process, so that n requests can be processed

Step by step, trace to "handle" request and find that it is finally in the instantiation of "HTTP connection"
The instantiated parameter has the previous callback containing future
In this way, you can ensure that the yield operation can return to its original place. OK, keep going

_HTTPConnection

class _HTTPConnection(httputil.HTTPMessageDelegate):
    _SUPPORTED_METHODS = set(["GET", "HEAD", "POST", "PUT", "DELETE", "PATCH", "OPTIONS"])

    def __init__(self, io_loop, client, request, release_callback,
                 final_callback, max_buffer_size, tcp_client,
                 max_header_size, max_body_size):
        self.start_time = io_loop.time()
        self.io_loop = io_loop
        self.client = client
        self.request = request
        self.release_callback = release_callback
        self.final_callback = final_callback
        self.max_buffer_size = max_buffer_size
        self.tcp_client = tcp_client
        self.max_header_size = max_header_size
        self.max_body_size = max_body_size
        self.code = None
        self.headers = None
        self.chunks = []
        self._decompressor = None
        # Timeout handle returned by IOLoop.add_timeout
        self._timeout = None
        self._sockaddr = None
        with stack_context.ExceptionStackContext(self._handle_exception):
            self.parsed = urlparse.urlsplit(_unicode(self.request.url))
            if self.parsed.scheme not in ("http", "https"):
                raise ValueError("Unsupported url scheme: %s" %
                                 self.request.url)
            # urlsplit results have hostname and port results, but they
            # didn't support ipv6 literals until python 2.7.
            netloc = self.parsed.netloc
            if "@" in netloc:
                userpass, _, netloc = netloc.rpartition("@")
            host, port = httputil.split_host_and_port(netloc)
            if port is None:
                port = 443 if self.parsed.scheme == "https" else 80
            if re.match(r'^\[.*\]$', host):
                # raw ipv6 addresses in urls are enclosed in brackets
                host = host[1:-1]
            self.parsed_hostname = host  # save final host for _on_connect

            if request.allow_ipv6 is False:
                af = socket.AF_INET
            else:
                af = socket.AF_UNSPEC

            ssl_options = self._get_ssl_options(self.parsed.scheme)

            timeout = min(self.request.connect_timeout, self.request.request_timeout)
            if timeout:
                self._timeout = self.io_loop.add_timeout(
                    self.start_time + timeout,
                    stack_context.wrap(self._on_timeout))
            self.tcp_client.connect(host, port, af=af,
                                    ssl_options=ssl_options,
                                    max_buffer_size=self.max_buffer_size,
                                    callback=self._on_connect)
_There is a pile of member variables in the instantiation of HTTPCONNECTION, which is a bit dizzy,
No matter how much, focus on our callback and tcpclient

Looking down one line, it's the initialization of host and port. HTTP and HTTPS are different. Of course, we have to deal with them,

At last, it's tcpclient.connect. From the connect parameter, we can see that callback = self. \,
It should be an important method to get rid of the string processing and find self.connection.write'headers (start line, self.request.headers),
This should be the operation to send the HTTP header. It's a network request, so this is the operation to send the HTTP header after processing the URL connect

Let's go back and see how to connect, because it's the key to asynchrony. If you understand this, the rest is the same

TCPClient

Go to tcpclient's code to see its instantiation and connect operation. It seems that there is still a long way to go
 The code of tcpclient instantiation is very short. There is a resolver object, regardless of

connect

    @gen.coroutine
    def connect(self, host, port, af=socket.AF_UNSPEC, ssl_options=None,
                max_buffer_size=None):
        """Connect to the given host and port.

        Asynchronously returns an `.IOStream` (or `.SSLIOStream` if
        ``ssl_options`` is not None).
        """
        addrinfo = yield self.resolver.resolve(host, port, af)
        connector = _Connector(
            addrinfo, self.io_loop,
            functools.partial(self._create_stream, max_buffer_size))
        af, addr, stream = yield connector.start()
        # TODO: For better performance we could cache the (af, addr)
        # information here and re-use it on subsequent connections to
        # the same host. (http://tools.ietf.org/html/rfc6555#section-4.2)
        if ssl_options is not None:
            stream = yield stream.start_tls(False, ssl_options=ssl_options,
                                            server_hostname=host)
        raise gen.Return(stream)
Go to the connect method, find the coroutine decorator, and set callback = self. \,
So when the future of coroutine is solved, self. \,
You can also see that the parameter of "on" is stream, that is, Gen. return (stream) is passed 
Because in the code of gen.coroutine implementation,
 After sending value, the code continues to go to gen.return,
It will go to set result in gen.coroutine.)

The first yield is self.resolver.resolve on the right, addrinfo on the left, and address information,
This asynchronous operation processes the address information of the URL. Here, tornado uses the blocking implementation by default. For the time being, don't look at it first,
It will be added in a new chapter later. The main content is the content of run on executor decorator,
In fact, it's returned synchronously here, because the default code is blockingresolver. Look directly at the next yield

_Connector

    def __init__(self, addrinfo, io_loop, connect):
        self.io_loop = io_loop
        self.connect = connect

        self.future = Future()
        self.timeout = None
        self.last_error = None
        self.remaining = len(addrinfo)
        self.primary_addrs, self.secondary_addrs = self.split(addrinfo)
_The connector instantiation parameter has a callback, which is the "create" stream of this class,
And set self.connect to the incoming callback 
So self.connect is tcpclient. ,
Member variables have a future instance. We need to pay close attention to future and callback

After instantiation, the start method is invoked, and try_connect, set_timout is called inside start.
According to the function name, it is the connect operation and set timeout operation. Finally, it returns the future created at the time of instantiation

try_connect

def start(self, timeout=_INITIAL_CONNECT_TIMEOUT):
        self.try_connect(iter(self.primary_addrs))
        self.set_timout(timeout)
        return self.future

    def try_connect(self, addrs):
        try:
            af, addr = next(addrs)
        except StopIteration:
            # We've reached the end of our queue, but the other queue
            # might still be working.  Send a final error on the future
            # only when both queues are finished.
            if self.remaining == 0 and not self.future.done():
                self.future.set_exception(self.last_error or
                                          IOError("connection failed"))
            return
        future = self.connect(af, addr)
        future.add_done_callback(functools.partial(self.on_connect_done,
                                                   addrs, af, addr))
Future = self.connect (AF, addr), which executes tcpclient. ﹣ create ﹣ stream method,
Return to future, and set future's callback = on "connect" done

_create_stream

    def _create_stream(self, max_buffer_size, af, addr):
        # Always connect in plaintext; we'll convert to ssl if necessary
        # after one connection has completed.
        stream = IOStream(socket.socket(af),
                          io_loop=self.io_loop,
                          max_buffer_size=max_buffer_size)
        return stream.connect(addr)
Instantiate iostream, execute and return stream.connect. The future returned by stream.connect is the future in try connect,
So, go inside and see how stream. Connect "solves" the future

IOStream

connect

def connect(self, address, callback=None, server_hostname=None):
        self._connecting = True
        if callback is not None:
            self._connect_callback = stack_context.wrap(callback)
            future = None
        else:
            future = self._connect_future = TracebackFuture()
        try:
            self.socket.connect(address)
        except socket.error as e:

            if (errno_from_exception(e) not in _ERRNO_INPROGRESS and
                    errno_from_exception(e) not in _ERRNO_WOULDBLOCK):
                if future is None:
                    gen_log.warning("Connect error on fd %s: %s",
                                    self.socket.fileno(), e)
                self.close(exc_info=True)
                return future
        self._add_io_state(self.io_loop.WRITE)
        return future
Self. _connecting = true set this instance is in the process of connecting and set to false after connecting 
If no callback is passed in, the future object will be generated. The future returned just now will be recorded in the member variable self. Connect
Then perform the connect operation of socket, because socket is set to non blocking,
So here will return immediately without blocking. When the connection is successful, the buffer can be written. When the connection fails, the buffer can be read and written. This is the basic knowledge, details Baidu
Then call self._add_io_state and return to future.

_add_io_state

    def _add_io_state(self, state):

        if self.closed():
            # connection has been closed, so there can be no future events
            return
        if self._state is None:
            self._state = ioloop.IOLoop.ERROR | state
            with stack_context.NullContext():
                self.io_loop.add_handler(
                    self.fileno(), self._handle_events, self._state)
        elif not self._state & state:
            self._state = self._state | state
            self.io_loop.update_handler(self.fileno(), self._state)
At last, we need epoll!!! According to the instantiated code, we know that self. \,
I will follow the steps of self.io'loop.add'handler. According to my previous [article] [2], I will register the current FD, the current instance's'handle'events, and write and error operations to epoll
Then!!!!! Finally, I have finished the yield process!!!!!!

Summary:

Please be sure to find out how the future is delivered and what is the operation of each future managed callback
  _In HTTP connection, tcpclient. Connect is a future, callback = self. \
 He will be added to the ioloop execution at raise Gen. return (stream) 
  Tcpclient.connect a future of connector.start inside, 
  Callback is on connect done, which will be added to the ioloop execution when the poll detects the write event

ioloop

    def start(self):
        if self._running:
            raise RuntimeError("IOLoop is already running")
        self._setup_logging()
        if self._stopped:
            self._stopped = False
            return
        old_current = getattr(IOLoop._current, "instance", None)
        IOLoop._current.instance = self
        self._thread_ident = thread.get_ident()
        self._running = True

        old_wakeup_fd = None
        if hasattr(signal, 'set_wakeup_fd') and os.name == 'posix':

            try:
                old_wakeup_fd = signal.set_wakeup_fd(self._waker.write_fileno())
                if old_wakeup_fd != -1:
                    signal.set_wakeup_fd(old_wakeup_fd)
                    old_wakeup_fd = None
            except ValueError:
                old_wakeup_fd = None

        try:
            while True:
                with self._callback_lock:
                    callbacks = self._callbacks
                    self._callbacks = []

                due_timeouts = []

                if self._timeouts:
                    now = self.time()
                    while self._timeouts:
                        if self._timeouts[0].callback is None:

                            heapq.heappop(self._timeouts)
                            self._cancellations -= 1
                        elif self._timeouts[0].deadline <= now:
                            due_timeouts.append(heapq.heappop(self._timeouts))
                        else:
                            break
                    if (self._cancellations > 512
                            and self._cancellations > (len(self._timeouts) >> 1)):

                        self._cancellations = 0
                        self._timeouts = [x for x in self._timeouts
                                          if x.callback is not None]
                        heapq.heapify(self._timeouts)

                for callback in callbacks:
                    self._run_callback(callback)
                for timeout in due_timeouts:
                    if timeout.callback is not None:
                        self._run_callback(timeout.callback)
                callbacks = callback = due_timeouts = timeout = None

                if self._callbacks:
                    poll_timeout = 0.0
                elif self._timeouts:

                    poll_timeout = self._timeouts[0].deadline - self.time()
                    poll_timeout = max(0, min(poll_timeout, _POLL_TIMEOUT))
                else:

                    poll_timeout = _POLL_TIMEOUT

                if not self._running:
                    break

                if self._blocking_signal_threshold is not None:

                    signal.setitimer(signal.ITIMER_REAL, 0, 0)

                try:
                    event_pairs = self._impl.poll(poll_timeout)
                except Exception as e:

                    if errno_from_exception(e) == errno.EINTR:
                        continue
                    else:
                        raise

                if self._blocking_signal_threshold is not None:
                    signal.setitimer(signal.ITIMER_REAL,
                                     self._blocking_signal_threshold, 0)

                self._events.update(event_pairs)
                while self._events:
                    fd, events = self._events.popitem()
                    try:
                        fd_obj, handler_func = self._handlers[fd]
                        handler_func(fd_obj, events)
                    except (OSError, IOError) as e:
                        if errno_from_exception(e) == errno.EPIPE:
                            # Happens when the client closes the connection
                            pass
                        else:
                            self.handle_callback_exception(self._handlers.get(fd))
                    except Exception:
                        self.handle_callback_exception(self._handlers.get(fd))
                fd_obj = handler_func = None

        finally:
            # reset the stopped flag so another start/stop pair can be issued
            self._stopped = False
            if self._blocking_signal_threshold is not None:
                signal.setitimer(signal.ITIMER_REAL, 0, 0)
            IOLoop._current.instance = old_current
            if old_wakeup_fd is not None:
                signal.set_wakeup_fd(old_wakeup_fd)
Then tornado finally returns to the ioloop code (tear rush)!! when the connection is successful, the FD buffer is writable,
 Epoll receives FD's write operation notification ~ it enters into epoll's loop for processing. Then! It's back to the registered "handle" events!
Note that this "handle" event is the "handle" event in the instance of iostream. It has all the information we just processed~

Next, look at the code of "handle" events to see if it solves the problem of future

IOStream._handle_events

    def _handle_events(self, fd, events):
        if self.closed():
            gen_log.warning("Got events for closed stream %s", fd)
            return
        try:
            if self._connecting:
                # Most IOLoops will report a write failed connect
                # with the WRITE event, but SelectIOLoop reports a
                # READ as well so we must check for connecting before
                # either.
                self._handle_connect()
            if self.closed():
                return
            if events & self.io_loop.READ:
                self._handle_read()
            if self.closed():
                return
            if events & self.io_loop.WRITE:
                self._handle_write()
            if self.closed():
                return
            if events & self.io_loop.ERROR:
                self.error = self.get_fd_error()
                # We may have queued up a user callback in _handle_read or
                # _handle_write, so don't close the IOStream until those
                # callbacks have had a chance to run.
                self.io_loop.add_callback(self.close)
                return
            state = self.io_loop.ERROR
            if self.reading():
                state |= self.io_loop.READ
            if self.writing():
                state |= self.io_loop.WRITE
            if state == self.io_loop.ERROR and self._read_buffer_size == 0:
                # If the connection is idle, listen for reads too so
                # we can tell if the connection is closed.  If there is
                # data in the read buffer we won't run the close callback
                # yet anyway, so we don't need to listen in this case.
                state |= self.io_loop.READ
            if state != self._state:
                assert self._state is not None, \
                    "shouldn't happen: _handle_events without self._state"
                self._state = state
                self.io_loop.update_handler(self.fileno(), self._state)
        except UnsatisfiableReadError as e:
            gen_log.info("Unsatisfiable read, closing connection: %s" % e)
            self.close(exc_info=True)
        except Exception:
            gen_log.error("Uncaught exception, closing connection.",
                          exc_info=True)
            self.close(exc_info=True)
            raise

    def _handle_connect(self):
        err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
        if err != 0:
            self.error = socket.error(err, os.strerror(err))
            # IOLoop implementations may vary: some of them return
            # an error state before the socket becomes writable, so
            # in that case a connection failure would be handled by the
            # error path in _handle_events instead of here.
            if self._connect_future is None:
                gen_log.warning("Connect error on fd %s: %s",
                                self.socket.fileno(), errno.errorcode[err])
            self.close()
            return
        if self._connect_callback is not None:
            callback = self._connect_callback
            self._connect_callback = None
            self._run_callback(callback)
        if self._connect_future is not None:
            future = self._connect_future
            self._connect_future = None
            future.set_result(self)
        self._connecting = False
To judge whether the connection is in progress, of course, I also stressed just now,
Then enter "handle" and "handle" connect. First, judge whether the connect is successful,
Success is to set the result of connect future, set result (self), and set self (iostream)!   
Then the "connect" future callbacks will be digested by ioloop in the next cycle!!

Step by step, we can see that this future is exactly the future returned from the right side of the yiled operation,
So the callback set just now by "connector. Try" connect, on "connect" will be executed in the callback of ioloop 
According to the source code of coroutine mentioned in the previous [article] [3], there is also a callback of runner.run in this future~
So, in run, send vaule to Gen 
Finally!! the program returned to the yield place just now!!!!!

This is how tornado implements asynchronous IO

I don't think it's realistic to talk about the whole operation all the time. Let's follow the rest by ourselves. The reason is similar to this process
 On the right side of the yield operation, it must return a future (the old version seems to be yieldpoint, because I haven't seen the old version, so I'm not sure),
Then, before returning to future, set up the handler of FD and other parsing work, and wait for epoll to detect the concerned IO event,
Solve the future in the IO handler, and then return to yield ~ the core is the three parts of IO loop, future, gen.coroutine 
Cooperate with each other to complete asynchronous operation. After tracking and digesting several times, you can write the extension of tornado 

I wish you all the best in martial arts

Recommended Today

The use of progressbarcontrol, a progress bar control of devexpress – Taking ZedGraph as an example to add curve progress

scene WinForm control – devexpress18 download installation registration and use in vs: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/100061243 When using ZedGraph to add curves, the number of curves is slower if there are many cases. So in the process of adding curve, the progress needs to be displayed, and the effect is as follows     Note: Blog home page:https://blog.csdn.net/badao_liumang_qizhi […]