Messagewriter for Kafka message store



Messagewriter is a tool class for Kafka to write messages. This part of the code has little to do with the whole system design, but from a local point of view, there are many interesting details, so I also open a short blog to talk about it.

Design intent of messagewriter

First, let’s list the possible changes during message writing, that is, the design requirements of this class:

  • The input sources are different, including the basic data types (int, long, byte, bytes) of bytes [] stream, which need to be supported.

  • The size of the written data is uncertain, so the adaptive capacity mechanism needs to be considered

  • Some automatic guarantee mechanism is required, such as automatically generating CRC and filling it into the header after writing data; Automatically size and fill to the head

We divide these three requirements into levels. Most of the basic types of writing can be attributed to byte writing. The functions of various types of writing and adaptive capacity are relatively low-level and can be realized more universally, while the third requirement is relatively high-level and can be realized separately.

In fact, Kafka does the same. The messagewriter inherits a parent class bufferingoutputstream, which is mainly used to write data from various input sources to the cache, and then batch write to the buffer.

Bufferingoutputstream parsing

The written message will be temporarily stored in bufferingoutputstream. Its capacity control is completed by forming a linked list through byte arrays. Each byte array is of fixed length (the length is passed in by the constructor). At the same time, each byte array is equipped with a cursor to indicate how many bytes have been written. At the same time, a reference is used to indicate the array currently being written. The definition of this basic structure is given below.

protected final class Segment(size: Int) {
    val bytes = new Array[Byte](size)
    var written = 0
    var next: Segment = null
    def freeSpace: Int = bytes.length - written

The control strategy of bufferingoutputstream is very simple, that is, when the current segment is full, add a new segment immediately. It is worth noting that this writing is irreversible, that is, when you write back, you will create a new segment instead of reusing the original segment.

So is it worth introducing more effective and complex control strategies? Kafka adopts this simple strategy because the message writing is one-time. A sequence of messages is written by a writer instead of reusing the writer. In addition, Kafka’s performance is more limited by network bandwidth, so it should adopt as simple a read-write strategy as possible to improve the efficiency of reading and writing without trying to reduce the creation and release of memory (GC is not its main problem).

Basic write

Let’s take the writing of byte array as an example, and the above code:

override def write(b: Array[Byte], off: Int, len: Int) {
    if (off >= 0 && off <= b.length && len >= 0 && off + len <= b.length) {
      var remaining = len
      var offset = off
      while (remaining > 0) {
        if (currentSegment.freeSpace <= 0) addSegment()

        val amount = math.min(currentSegment.freeSpace, remaining)
        System.arraycopy(b, offset, currentSegment.bytes, currentSegment.written, amount)
        currentSegment.written += amount
        offset += amount
        remaining -= amount
    } else {
      throw new IndexOutOfBoundsException()

For basic types of writing, messagewriter uses bit operations, writing byte by byte, to ensure that they are built on the basis of byte writing. This is a beautiful generalization. We also take the writing of 32-bit int as an example.

 private def writeInt(value: Int): Unit = {
    write(value >>> 24)
    write(value >>> 16)
    write(value >>> 8)

Realization of guarantee mechanism

Let’s post the code first and then talk nonsense

  def write(key: Array[Byte] = null,
            codec: CompressionCodec,
            timestamp: Long,
            timestampType: TimestampType,
            magicValue: Byte)(writePayload: OutputStream => Unit): Unit = {
    withCrc32Prefix {
      // write magic value
      // write attributes
      var attributes: Byte = 0
      if (codec.codec > 0)
        attributes = (attributes | (CompressionCodeMask & codec.codec)).toByte
      if (magicValue > MagicValue_V0)
        attributes = timestampType.updateAttributes(attributes)
      // Write timestamp
      if (magicValue > MagicValue_V0)
      // write the key
      if (key == null) {
      } else {
        write(key, 0, key.length)
      // write the payload with length prefix
      withLengthPrefix {

private def withLengthPrefix(writeData: => Unit): Unit = {
    // get a writer for length value
    val lengthWriter = reserve(ValueSizeLength)
    // save current size
    val oldSize = size
    // write data
    // write length value
    writeInt(lengthWriter, size - oldSize)

From the above code, we can see that Scala’s with is very similar to Python’s descriptor. Code blocks are called in with as functions that return no parameters. But I want to ask another question, that is, how to record the previous position. We said that the writing process is irreversible, and the written cursor cannot go back, but we must write the CRC after writing the data, so we need something similar to the mark and reset mechanism of buffer.

However, we can’t move the cursor directly like buffer, because we need to write the next message smoothly. Moving the cursor and resetting it is too expensive. Can we intercept this small segment of memory and assign it to another reference in advance, and write to the new reference when writing, which is independent of the original data writing process. This is what messagewriter does. Well, let’s introduce the so-called small memory.

protected class ReservedOutput(seg: Segment, offset: Int, length: Int) extends OutputStream {
    private[this] var cur = seg
    private[this] var off = offset
    Private [this] var len = length // reserved memory size

    override def write(value: Int) = {
      if (len <= 0) throw new IndexOutOfBoundsException()
      if (cur.bytes.length <= off) {
        cur =
        off = 0
      cur.bytes(off) = value.toByte
      off += 1
      len -= 1

But you must have seen the above problem. Isn’t this writing overwriting the data, so we need to reserve some memory when writing data, and then we need to skip some memory space.

private def skip(len: Int): Unit = {
    if (len >= 0) {
      var remaining = len
      while (remaining > 0) {
        if (currentSegment.freeSpace <= 0) addSegment()

        val amount = math.min(currentSegment.freeSpace, remaining)
        currentSegment.written += amount
        remaining -= amount
    } else {
      throw new IndexOutOfBoundsException()

Reservation operation here

def reserve(len: Int): ReservedOutput = {
    val out = new ReservedOutput(currentSegment, currentSegment.written, len)

Let’s look at withlengthprefix. First reserve the memory of valuesize, then write the data, and finally calculate the whole memory change, that is, the size of the written data. Is it perfect to write it into the reserved memory?