From Kafka to NiO


Before talking about NiO, let’s briefly review the kernel state and user state

Kernel space is the running space of Linux kernel, while user space is the running space of user program. In order to ensure the kernel security, they are isolated. Even if the user’s program crashes, the kernel will not be affected.
Kernel space can execute arbitrary commands and call all resources of the system. User space can only perform simple operations, and can’t directly call system resources (I / O, process resources, memory allocation, peripherals, timers, network communications, etc.). Only through the system interface (also known as system call), can instructions be sent to the kernel.

From Kafka to NiO

When user processes access system resources through system calls, they need to switch to kernel state, which corresponds to some special stack and memory environment and must be established before system calls. At the end of the system call, the CPU will switch back from the kernel state to the user state, and the stack must be restored to the context of the user process. And this kind of switching will take a lot of time.

Process buffer

When a program reads a file, it first applies for a memory array, which is called buffer. Then it calls read every time, reads the data of the set byte length, and writes the buffer. (fill the buffer a small number of times). After the program is to get data from the buffer, when the buffer is used up, the next call, fill the buffer. The purpose of this buffer is to reduce the frequent system calls caused by frequent I / O operations, so as to reduce the switching time of the operating system between user mode and core mode.

Kernel buffer

In addition to designing buffers in the process, the kernel also has its own buffers.

When a user process wants to read data from the disk, the kernel does not read the disk directly, but copies the data in the kernel buffer to the process buffer.

But if there is no data in the kernel buffer, the kernel will add the request to the request queue, and then suspend the process to provide services for other processes.

When the data has been read into the kernel buffer, the user process will be informed when the data in the kernel buffer is read into the user process. Of course, different IO models have different ways of scheduling and using the kernel buffer.

You can think of read as copying data from the kernel buffer to the process buffer Is to copy the process buffer to the kernel buffer.

Of course, write does not necessarily cause the kernel to write. For example, the OS may accumulate a certain amount of data in the kernel buffer and write it again. That’s why power outages sometimes lead to data loss.

Therefore, our IO operation request process is as follows: the user process initiates the request (calls the system function), the kernel receives the request (the process will switch from the user mode to the kernel mode), obtains the data from the I / O device to the kernel buffer, copies the data in the kernel buffer to the address space of the user process, and the user process obtains the data and then responds to the client.

I / O reuse model

Javanio uses the I / O reuse model

From Kafka to NiO

As you can see from the figure, we block the select call, waiting for the datagram socket to become readable. When select returns the condition that the socket is readable, we call recvfrom to copy the read data from the kernel buffer to the application process buffer.

So how can kernel state judge whether I / O stream is readable or writable?
The kernel determines whether the read and write buffers are readable or not

Java has been using epoll since 1.5 instead of the previous select, which enhances select. The more characteristic is that epoll supports two ways: level trigger (default epoll) and edge trigger.

Epoll is more efficient than select in two aspects(…

  1. Reduce file handle copy between user mode and kernel mode
  2. Reduce traversal of readable and writable file handles

The corresponding operation modes of epoll and NiO are as follows:
From Kafka to NiO

  1. epoll_ CTL registration event
  2. epoll_ Wait poll all sockets
  3. Handle the corresponding event

Among epoll, horizontal trigger (LT) and edge trigger (ET) are more interesting.

Horizontal trigger (conditional trigger): as long as the read buffer is not empty, the read event will always be triggered; as long as the write buffer is dissatisfied (the sending speed is faster than the writing speed), the write event will always be triggered. This is more in line with programming habits, and is also the default mode of epoll.

Edge trigger (state trigger): when the read buffer state changes from idle to non empty, trigger once; when the write buffer state changes from full to non full, trigger once. For example, if you send a large file and fill the write buffer, then the buffer can be written, there will be a switch from full to dissatisfied.

Through the analysis, we can see that:
For lt mode, it is necessary to avoid the “write loop” problem: the probability of writing buffer full is very small, that is, the “write condition” will always be met, so if you register a write event, there is no data to write, but it will always trigger, so in LT mode, after writing data, you must cancel the write event.

Corresponding to the ET mode, we should avoid the “short read” problem: for example, if you receive 100 bytes, it will trigger once, but you read only to 50 bytes, and if the remaining 50 bytes are not read, it will not trigger again, and this socket will be useless. Therefore, in et mode, the data of “read buffer” must be read out.


The code is too long, so I only list a section of the main code on the server side. The client side is relatively simple, and the writing method is similar to the server side, so I won’t list it

Selector selector =;

//Create channel serversocketchannel
ServerSocketChannel serverSocketChannel =;
//Set the channel to non blocking

ServerSocket serverSocket = serverSocketChannel.socket();
serverSocket.bind(new InetSocketAddress(8989));

*Register the channel with the channel manager and register for the channel selectionKey.OP_ Accept event
*After registering the event, when the event arrives, () will return,
*If the event does not arrive () will block all the time.
serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT);

//Cyclic treatment
while (true) {
    //When the registration event arrives, the method returns, otherwise the method will block all the time;

    //Get listening events
    Set<SelectionKey> selectionKeys = selector.selectedKeys();
    Iterator<SelectionKey> iterator = selectionKeys.iterator();

    //Iterative processing
    while (iterator.hasNext()) {
        //Get event
        SelectionKey key =;

        //Remove events to avoid duplicate processing

        //Client request connection event, accept client connection ready
       if (key.isAcceptable()) {
           ServerSocketChannel server = (ServerSocketChannel);
           SocketChannel socketChannel = server.accept();
           //Set the write event to the channel, and the client will read after listening to the write event
           socketChannel.register(selector, SelectionKey.OP_WRITE);
        } else if(key.isWritable()) {

When the client connects to the server, it will find that the server has been receiving write events,Write will always print. Therefore, when using the condition triggered API, if the application does not need to write, it should not pay attention to the event that the socket can write, otherwise it will immediately return a write ready notification infinitely. The commonly used select belongs to the conditional trigger category. If you pay attention to socket write events for a long time, you will have 100% CPU faults. Therefore, when using NiO programming in Java, when there is no data to be written out, the write event should be cancelled, and when there is data to be written out, the write event should be registered again.

The cancel write event can be written like thisselectionKey.interestOps(key.interestOps() & ~SelectionKey.OP_WRITE);

How to deal with it in Kafka

In the last analysis of Kafka network layer, we know that it communicates with the server through NiO. There is such a code in the send() method of kafkachannel:

private boolean send(Send send) throws IOException {
  if (send.completed())

  return send.completed();

Please pay attention to the information heretransportLayer.removeInterestOps(SelectionKey.OP_WRITE), which removes the registered Op_ Write event.

Now that it’s cancelled, it will definitely be added. Before sending data, OP is registered in setsend() method of kafkachannel_ Write event

public void setSend(Send send) {
  if (this.send != null)
      throw new IllegalStateException("Attempt to begin a send operation with prior send operation still in progress, connection id is " + id);
  this.send = send;

So it’s the same sentence:When there is no data to write out, cancel the write event, and register the write event when there is data to write out.