Native Linux asynchronous file operation, io_uring experience

Time:2019-7-30

History of Linux Asynchronous IO

Asynchronous IO has always been a pain in Linux systems. Linux has long had the asynchronous IO implementation of POSIX AIO, but it is very inefficient to simulate user threads in user space. Later, Linux 2.6 introduced a real kernel-level support for asynchronous IO implementation (Linux aio), but it only supports Direct IO, only supports disk file reading and writing, and there are limitations on file size, in a word, all kinds of troubles. So far (May 2019), libuv has implemented asynchronous IO in the form of pthread + preadv.

With the release of Linux 5.1, Linux finally has its own easy-to-use asynchronous IO implementation, and supports most file types (disk files, sockets, pipes, etc.). This is the protagonist of this article: io_uring

IOCP

Unlike the IO multiplexing model epoll, the idea of io_uring is more similar to that of IOCP on Windows. Take express delivery as an example: Synchronization model is that you wait downstairs until the express company delivers the goods downstairs before placing the order on the platform of the electronics business, and then you take the goods upstairs. Epoll is similar to your order. The express company sends it downstairs to inform you that you can go downstairs to pick up the goods. Then you go downstairs and bring them up. Although users still need to go downstairs to pick up goods (there is a period of synchronous reading and writing time), but because there is no need to wait for express delivery on the road time, efficiency has been greatly improved. However, epoll is not suitable for disk IO because disk files are always readable.

IOCP is one step in place, delivering goods directly to the door, even downstairs to take action is not necessary. The whole process is completely non-blocking.

Simple use of io_uring

Io_uring is a set of system call interfaces. Although there are three system calls in total, the actual use of io_uring is very complex. Here is a direct introduction to liburing, which has been encapsulated for user-friendly use.

Before attempting, please first confirm that your Linux kernel version is above 5.1 (uname-r). Liburing needs to be compiled by itself (which may then be included as software packages in major Linux distributions).git clonePost-direct./configure && sudo make installJust fine.

Io_uring structure initialization

Liburing provides its own core structure, io_uring, which encapsulates io_uring’s own file descriptor (fd) and other variables needed to communicate with the kernel.

struct io_uring {
    struct io_uring_sq sq;
    struct io_uring_cq cq;
    int ring_fd;
};

Initialization is required before use, and this structure is initialized using io_uring_queue_init.

extern int io_uring_queue_init(unsigned entries, struct io_uring *ring,
    unsigned flags);

As the function name shows, io_uring is a ring_buffer. First parameterentriesRepresents the queue size (the actual space may be larger than the user specified); the second parameter ring is the io_uring structure pointer that needs to be initialized; and the third parameterflagsIt’s a marker parameter. Pass 0 without special need. for example

#include <liburing.h>
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);

Submit read and write requests

First, we use io_uring_get_sqe to get the SQE structure.

extern struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring);

An SQE (submission queue entry) represents an IO request and occupies a vacancy in the circular queue. When the io_uring queue is full, io_uring_get_sqe returns NULL, paying attention to error handling. Note that queues here refer to uncommitted requests, and submitted (but not completed) requests do not occupy a place.

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);

Then use io_uring_prep_readv or io_uring_prep_write to initialize the SQE structure.

static inline void io_uring_prep_readv(struct io_uring_sqe *sqe, int fd,
                       const struct iovec *iovecs,
                       unsigned nr_vecs, off_t offset);
static inline void io_uring_prep_writev(struct io_uring_sqe *sqe, int fd,
                    const struct iovec *iovecs,
                    unsigned nr_vecs, off_t offset);

First parametersqeThat is, the SQE structure pointer obtained earlier; FD is a file descriptor that needs to be read and written, either disk file or socket;iovecsFor iovec arrays, refer to readv and writev for specific use.nr_vecsFor the number of elements in the iovecs array, offset is the offset of the file operation.

You can see that these two functions follow exactly the same pattern.preadvandpwritevDesign, semantics are the same, so it’s very good. It should be noted that if you need to read and write files sequentially, offset needs to be maintained by the program itself.

struct iovec iov = {
    .iov_base = "Hello world",
    .iov_len = strlen("Hello world"),
};
io_uring_prep_writev(sqe, fd, &iov, 1, 0);

After initializing sqe, you can use io_uring_sqe_set_data to pass in your own data, usually a malloc pointer, which can be passed directly in C++.

static inline void io_uring_sqe_set_data(struct io_uring_sqe *sqe, void *data);

Be carefulprep_*China Central Committee memset (0), so we must firstprep_*againset_data。 I’ve been struggling here for two hours.

Once SQE is ready, you can submit the request using io_uring_submit.

extern int io_uring_submit(struct io_uring *ring);

You can initialize multiplesqeThen one timesubmit

io_uring_submit(&ring);

Complete IO request

Io_uring_submit is all asynchronous and does not block the current thread. So how do you know when the submitted operation is completed? Liburing provides two functions io_uring_peek_cqe and io_uring_wait_cqe to obtain the IO operation that has been completed at present.

extern int io_uring_peek_cqe(struct io_uring *ring,
    struct io_uring_cqe **cqe_ptr);
extern int io_uring_wait_cqe(struct io_uring *ring,
    struct io_uring_cqe **cqe_ptr);

The first parameter is the io_uring structure pointer; the second parametercqe_ptrIs the output parameter, is the address of the CQE pointer variable.

CQE (completion queue entry) marks a completed IO operation and also records previously incoming user data. Each CQE corresponds to the previous sqe.

These two functions, io_uring_peek_cqe, return immediately if no IO operation has been completed, and cqe_ptr is emptied;
Io_uring_wait_cqe blocks the thread and waits for the IO operation to complete.

for (;;) {
    io_uring_peek_cqe(&ring, &cqe);
    if (!cqe) {
        puts("Waiting...");
        // Acept new connection, do other things
    } else {
        puts("Finished.");
        break;
    }
}

For the sake of simplicity, we use busy waiting as an example. In practical application scenarios, it should be an event loop. Browsers and nodejs hide the implementation of event loop inside us, while writing C/C++ language can only be done by ourselves.

User data previously set for SQE can be obtained by io_uring_cqe_get_data.

static inline void *io_uring_cqe_get_data(struct io_uring_cqe *cqe);

By default, IO completion events will not be cleared from the queue, resulting inio_uring_peek_cqeThe same event is retrieved, usingio_uring_cqe_seenMark that the event has been processed

static inline void io_uring_cqe_seen(struct io_uring *ring,
                     struct io_uring_cqe *cqe);
io_uring_cqe_seen(&ring, cqe);

Clear io_uring and release resources

Clearing io_uring structure using io_uring_queue_exit

extern void io_uring_queue_exit(struct io_uring *ring);
io_uring_queue_exit(&ring);

finish

The complete code is listed as follows: The purpose of this code is to create a file./home/carter/test.txtAnd write the stringHello world

#include <liburing.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

int main()
{
    struct io_uring ring;
    io_uring_queue_init(32, &ring, 0);

    struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
    int fd = open("/home/carter/test.txt", O_WRONLY | O_CREAT);
    struct iovec iov = {
        .iov_base = "Hello world",
        .iov_len = strlen("Hello world"),
    };
    io_uring_prep_writev(sqe, fd, &iov, 1, 0);
    io_uring_submit(&ring);

    struct io_uring_cqe *cqe;

    for (;;) {
        io_uring_peek_cqe(&ring, &cqe);
        if (!cqe) {
            puts("Waiting...");
            // Acept new connection, do other things
        } else {
            puts("Finished.");
            break;
        }
    }
    io_uring_cqe_seen(&ring, cqe);
    io_uring_queue_exit(&ring);
}

As you can see, the asynchronous operation of C language is much more complicated than the synchronous operation, libuv (the underlying IO Library of nodejs) has indicated that io_uring will be introduced. If you want to use it yourself, you must use a coroutine library to simplify the asynchronous operation.

Here is a simple file server demo I implemented using my own Cxx-yield library. As you can see, after simple encapsulation, asynchronous file reading and writing can be simplified to one line: https://github.com/CarterLi/C…. It’s the pleasure of writing async and await in JavaScript.

Recommended Today

NET Framework,.net Core and.net Standard

Recently started working on.net Core, and there’s a diagram that shows the relationship.   The diagram above shows that the.net Framework and.net Core implement what is related to the.net Standard, or that the Framework and Core are developed based on the NET Standard. Therefore, we can use the NET Standard project type when building the […]