Filemessageset of Kafka message store

Time:2021-8-30

abstract

Friends who have read the previous blogs may ask that they have been forced so much that they don’t know where the news is stored. The title is clear. This time, let’s look at the underlying operation class filemessageset related to storage. It is a subclass of messageset, which operates the read and write operations between messages and files. Think about it, we also know that this is to write additions, deletions, changes and checks. There’s really nothing to say about the code this time, but filemessageset is really an important class. Let’s talk about it briefly.

Functions of filemessageset

  • Addition, deletion, modification and query of messages

  • Make necessary checks, such as whether it is the specified message format (check the magic value)

  • Convert message format

For the core function – add, delete, modify and query, we will further expand here. First, filemessageset only processes the outermost messages, and does not consider nested messages. Nested messages will be handed over to the previous bytebuffermessageset for processing. To some extent, we can also regard bytebuffermessageset as nested messages.

Filemessageset deletion can also be divided into two types: one is to truncate from a specific location, and the other is to directly delete the entire file. Its query mainly obtains its position in the file from the sequence number of the message, that is, offset. It is only allowed to add to the tail. If you want to add in the middle, you must truncate it first.

Let’s list some important atomic operations

  • read(buffer,position,length),read(position,length):FileMessageSet

  • writeTo(channel,position,size)

  • truncate(size)

  • search(offset):position

  • close

  • flush

Design of filemessage

Filemessageset uses filechannel for reading and writing. Our operation depends on position and needs to be located first. Similarly, filemessageset allows slicing, that is, intercepting a part of the file and specifying start and end. But in this way, end needs to be considered at the end of each inspection.

The first thing to note here is that the channel cursor should always be positioned at the tail of the set to ensure that the writes are sequential, so the cursor should be moved to the tail during initialization.

The second point is to flush and then truncate when closing the channel. This may not be easy to understand. For example, if I use fragmentation and write a new message after the position end, all subsequent messages must be discarded because the messages must be orderly. This is also a feature that ensures the sequential writing of messages.


     def close() {
        flush()
        trim()
        channel.close()
      }

The third point is the iterative process, in which almost all atomic operations are implemented from traversal. More inspection operations are required in traversal, mainly the following points.

  • If the currently read messagesize is less than the minimum message header size, an error occurs in the message

  • If the currently read messagesize is greater than the remaining capacity, the last message is incomplete

  • If the remaining capacity is less than offsetsize + messagesizelength, there are no messages

However, the capacity here needs to consider the end of the specified end and channel at the same time. Let’s take the generation of iterators as an example.

override def makeNext(): MessageAndOffset = {
            //The last message appears after end
            if(location + sizeOffsetLength >= end)
              return allDone()
    
    
            // read the size of the item
            sizeOffsetBuffer.rewind()
            channel.read(sizeOffsetBuffer, location)
    
            //The last message appears in the next file
            if(sizeOffsetBuffer.hasRemaining)
              return allDone()
    
            sizeOffsetBuffer.rewind()
            val offset = sizeOffsetBuffer.getLong()
            val size = sizeOffsetBuffer.getInt()
    
            //The last message was truncated or there was a problem with the message size
            if(size < Message.MinMessageOverhead || location + sizeOffsetLength + size > end)
              return allDone()
          //The message is too large
            if(size > maxMessageSize)
              throw new CorruptRecordException("Message size exceeds the largest allowable message size (%d).".format(maxMessageSize))
    
            // read the item itself
            val buffer = ByteBuffer.allocate(size)
            channel.read(buffer, location + sizeOffsetLength)
    
            //The last message was truncated by the file
            if(buffer.hasRemaining)
              return allDone()
            buffer.rewind()
    
            // increment the location and return the item
            location += size + sizeOffsetLength
            new MessageAndOffset(new Message(buffer), offset)
          }

The fourth is that the addition is based on bytebuffermessageset, which mainly unifies nested messages, general messages and batch writing in one method.

Article 5 is an interesting code detail

def delete(): Boolean = {
    CoreUtils.swallow(channel.close())
    file.delete()
  }

def swallow(log: (Object, Throwable) => Unit, action: => Unit) {
    try {
      action
    } catch {
      case e: Throwable => log(e.getMessage(), e)
    }
  }

Here, the code block is wrapped in the try catch. This method call is very concise and beautiful. It is a bit similar to collecting exceptions using AOP, which is worth learning from.

Process of message reading

At this point, let’s review the contents of the entire message store and sort out the complete process.

  1. First, filemessageset reads the outermost message

  2. If message is a nested message, bytebuffermessageset is generated, decompressed and atomic message set is generated

  3. Check and obtain basic information such as message format by calling message’s own methods

  4. Get the key value object through messageandmeta plus the decoder

Message writing process

  1. First, the messagewriter writes the key value and the message header to generate a buffer

  2. For nested messages, generate a bytebuffermessageset using the buffer and convert it into a new bytebuffermessageset

  3. Then use filemessageset to append bytebuffermessageset