Kafka message store overview



As a message middleware system, Kafka faces the primary problem of how to persist messages and how to read, write and parse them conveniently. This article will start with Kafka’s message storage, and then explain the important code parts one by one. Kafka’s message concept, first of all, we are not talking about messages in network transmission, but more inclined to the meaning of recording, that is, the actual objects concerned by consumers and producers. A message is the smallest unit created by Kafka. It is not allowed to change the actual content of a message. A message is essentially a key value pair with default key values.

message format

Take Kafka version 0.10.0 as an example to explain its message format:
CRC + magic + attributes + wrappertimestamp (optional) + key (length + content) + payload (length + content)

Each part is listed in turn below

*1. 4 byte CRC32 calibration value
 *2. 1 byte "magic" identifier to display whether the message format has changed. The value is 0 / 1 (can also be regarded as version number)
 *3. 1 byte "attributes" identifier, including the following contents:
 *Bit 0 ~ 2: compression coding mode
 *      0 : no compression
 *      1 : gzip
 *      2 : snappy
 *      3 : lz4
 *Bit 3: timestamp type
 *      0 : create time
 *      1 : log append time
 *Bit 4 ~ 7: reserved part
 *4. (optional) 8 byte timestamp, which is carried only when magic is 1
 *5. 4 byte key length, specifies the length of the key part
 * 6. K byte key
 *7. 4 byte payload length, specifies the length of the value
 * 8. V byte payload

Kafka’s message format is designed to allow multiple nesting, which is achieved by compression. Imagine that the key of a message is empty, and its value part is a compressed messageset. After decompression and reading, it is a set of key value pairs, which is somewhat similar to JSON. But in fact, Kafka’s messages only allow double nesting, which is not determined by the limitations of its message format, but by the complexity of parsing when reading messages. Nested messages make it possible for Kafka to transfer complex types of objects, but for performance reasons, object serialization and too complex data format are not suitable for the business of message system, or Kafka has made a beautiful compromise in performance and expression ability.

At the same time, we also want to focus on timestamp correlation. There are three values of timestamp, namely – 1 represents no timestamp, 0 represents the creation time of the message, and 1 represents its persistence time (which can also be understood as the time when the receipt is processed by Kafka). For nested messages, if we select the timestamp type as receipt time, the timestamp of the compressed message is consistent with its outer message; If we select the timestamp type as creation time, it should be read from the byte code stream; If the magic value is 0, the timestamp should be considered to be – 1 and the timestamp type should be create anyway_ TIME。

Code design of message class

Although it seems that the message format is relatively simple, in fact, the code is relatively complex. The most important problems are: 1. Be compatible with the situation where magic is 0; 2. Allow expansion for subsequent version upgrades. Let’s first think about what functions the message class should have. It is roughly divided into the following parts:

Predefined variables

Since the operation is bytes and a large number of values require bit operation, we should predefine some bit operation auxiliary variables and some important offset positions. There are several important predefined variables that need to be emphasized:

  • The position of key length byte, because it is the dividing point of head and body

  • Read auxiliary variables of timestamp type and compression encoding method from attributes byte

Code below

   * Specifies the mask for the compression code. 3 bits to hold the compression codec.
   * 0 is reserved to indicate no compression
  val CompressionCodeMask: Int = 0x07
   * Specifies the mask for timestamp type. 1 bit at the 4th least significant bit.
   * 0 for CreateTime, 1 for LogAppendTime
  val TimestampTypeMask: Byte = 0x08
  val TimestampTypeAttributeBitOffset: Int = 3

    public byte updateAttributes(byte attributes) {
        return this == CREATE_TIME ?
            (byte) (attributes & ~Record.TIMESTAMP_TYPE_MASK) : (byte) (attributes | Record.TIMESTAMP_TYPE_MASK);

    public static TimestampType forAttributes(byte attributes) {
        int timestampType = (attributes & Record.TIMESTAMP_TYPE_MASK) >> Record.TIMESTAMP_TYPE_ATTRIBUTE_OFFSET;
        return timestampType == 0 ? CREATE_TIME : LOG_APPEND_TIME;

 def compressionCodec: CompressionCodec = 
    CompressionCodec.getCompressionCodec(buffer.get(AttributesOffset) & CompressionCodeMask)

Get method of each attribute

Needless to say, it should be noted that the values of magic are different, and the offset positions of some values are different. Therefore, it is necessary to write a static method in advance to quickly obtain the position offset under different magic.

Legitimacy check

Mainly check the following aspects:

  • Is the combination of magic value, timestamp type and timestamp value consistent

  • Whether CRC check is passed

Message conversion under different magic values

1. Calculate and allocate the space required for the new message
2. Write a new magic value
3. Take the original attribute and update it with the set timestamptype, and then write the attribute value
4. If 0 - > 1, write the timestamp 
5. Write the original message body
6. Calculate the new CRC value and fill it in

A series of construction methods

There are two main tectonic pathways:

  • It is mainly used to construct nested messages, directly pass in buffer data, and set timestamp and timestamp types

  • It is mainly used to construct atomic messages, pass in key value pair data, and set main parameters (magic, compression coding, timestamp type, timestamp)

Main classes related to messages

Kafka message store overview

The following describes the functions of each class in turn

  • Messageandoffset: append the offset in the set to the message

  • Messageandmeta: its main function is that the wrapper decoder decodes messages into key and value objects

  • Messageset: manages the message set, which is read in sequence and written in batch in advance, but I think the focus of its code is how to solve the parsing of message nesting

  • Bytebuffermessageset: ByteBuffer is used to store sequential messages, mainly to facilitate the operation of reading messages

  • Bytebufferbackedinputstream: the mode of wrapping buffer read / write into a stream