Message storage structure
Each topic of Kafka has multiple partitions, and the messages in a single partition are ordered. Partition is composed of multiple segments in physical storage. Each segment contains two files, index file and log file.
Index file and log file of physical entity
Logical entity topic > partition > segment
1. Partition storage
In the Kafka file storage, there are several different partitions under the same topic. Each partition is a directory. The partition naming rule is topic name + ordinal sequence number. The first partition sequence number starts from 0, and the maximum sequence number is the number of partitions minus 1.
Each partition (directory) is equivalent to a large file being evenly allocated to the size (configurable) log.segment.bytes ）But the number of messages in each segment file is not necessarily equal, depending on the size of the message, which is convenient for quick deletion.
2. Segment storage
Segment file is composed of two parts, index file and data file, one-to-one correspondence, appear in pairs, suffix is. Index and. Log, corresponding to index file and data file respectively. The first segment file name starts from 0, and each subsequent segment file name is the offset value of the last message of the previous segment file.
The. Index file is an index file, each line of data includes two values, the first message + the physical offset of the message in the log file. The. Log file stores the actual data of the message, and each line is composed of offset + message. The details are shown in the figure below:
Message parameter description:
8 byte offset every message in the partition has an ordered ID number, which is called offset. It can uniquely determine the position of each message in the partition. That is, offset represents the number of messages of partiion
4 byte message size message size
4 byte CRC32 check message with CRC32
1 byte “magic” indicates the version number of Kafka service program agreement released this time
1 byte “attributes” denotes the independent version, or identifies the compression type, or encoding type.
4 byte key length indicates the length of the key. When the key is – 1, the K byte key field is not filled in
KByte key optional
Value bytes payload represents the actual message data.
Find message by offset
Taking the message with offset = 36876 in the above figure as an example, you need to go through the following two steps:
The first step is to find the segment file, where the index represents the first file with an offset of 0. The starting offset of message volume of the second file is 368770 = 368769 + 1. Similarly, the starting offset of the third file, 0000000000000737337. Index, is 737338 = 737337 + 1. Other subsequent files name and sort these files according to the starting offset. As long as you search the file list according to the offset dichotomy, you can quickly locate specific files. When offset = 368776, locate to 00000000000000368769. Index | log
The second step is to find the message through the segment file. The first step is to locate the segment file. When the offset is 368776, locate the physical location of the metadata of 00000000000000368769.index and the physical offset address of 00000000000000368769.log in turn, and then find the message through the sequence of 00000000000000368769.log until the offset is 368776.
The producer can connect with any broker (the producer will not communicate with zookeeper), obtain the partition information of the topic (each broker has all the topic information), find the broker of the leader of each partition, and then establish a connection with the broker. When sending a message, it can decide which partition the message will be sent to by polling or randomly selecting a partition.
Kafka message sending includes synchronous and asynchronous. Synchronous sending configurable acks parameter, which can configure the confirmation level of the message. When acks = – 1, all the replicates in ISR are required to confirm that they have received the message and then return it to the producer for success (the leader will drop the message to disk first, and the replica in ISR may not drop the message to disk after it has been received, and it will be considered as successful in memory); if acks = 0, success will be returned directly (without leader confirmation); if acks = 1, the leader will drop the message to disk and then return it. Asynchronous sending directly returns the success of sending, and the background thread scans the queue length to reach a certain length or configuration time, and then batch sends messages to the leader.
The basic process is as follows:
a. When creating a topic, the partition information of the topic will be registered in ZK
b. Producer obtains all partitions of topic from broker
c. According to a certain load balancing algorithm, which partition to send messages to
d. Finally, the message is sent to the broker according to the leader broker of the partition
e. When the topic partition changes, the producer will get new partition information from the broker again
Kafka’s message producer uses Producer.scala , client through producer.type Two modes, sync and async, can be used for configuration. Client call Producer.send Send a message.
In synchronous mode, the DefaultEventHandler.handle Method to serialize the message. The serialization method is the default encoder, which can be configured by the producer serializer.class ）After that, try to send the message within the maximum number of retries (three times by default), call the dispatchserializeddata method, and select the partition of the message in the method.
If the message has no key and is the first message under the corresponding topic of the client, a partition is randomly selected and the relationship between the corresponding partition and topic is cached to sendpartitionpertopiccache. After that, all messages without key under the topic will be sent to the partition. Sendpartitionpertopic cache will be configured at the corresponding time（ topic.metadata.refresh . interval.ms , 600000 by default) to prevent all messages from being sent to the same partition.
If the message key is not empty, the default partition method is called DefaultPartitioner.partition 。 The value after the key hash takes the modulus of the partition value to get the partition corresponding to the message. The partitioner interface can be implemented by the user to realize the user-defined partition policy (producer) configuration partitioner.class ）。
How not to lose news
After the message arrives at the broker, the leader first drops the message to the disk. If acks = – 1, you need to wait for the replica copy message in ISR, and then return success after all copies are completed. If the waiting time is out of time, you will return the message failed to send.
If you want to strictly ensure that the message is not lost, you can configure more than two replicates for the topic. At the same time, the producer’s acks is set to – 1, and each message requires the replica to confirm the replication before returning success.
The message sending flow chart is as follows:
High availability and high scalability of broker