Message persistence of Kafka


Message persistence of Kafka

1. Overview of Kafka message persistence

Kakfa relies on the file system to store and cache messages. The traditional concept of hard disk is that hard disk is always very slow. Can file system based architecture provide excellent performance? In fact, the speed of a hard disk depends entirely on how it is used. At the same time, Kafka has the following disadvantages based on JVM memory:

  • The memory overhead of an object is very high, usually twice or more than the data to be stored
  • With the increase of data in the heap, the speed of GC becomes slower and slower

In fact, the performance of disk linear write is much better than that of write at any location. Linear read-write is optimized by the operating system (read ahead, write behind and other technologies), even faster than random memory read-write. Therefore, unlike the common design of caching data in memory and then brushing it to the hard disk, Kafka directly writes the data to the log of the file system

  • Write operation: appending the data sequence to the file
  • Read operation: read from file

The benefits of this approach are:

  • Read operations do not block write and other operations, and data size does not affect performance
  • Hard disk space is less limited than memory space
  • Linear access disk, fast, can save longer, more stable

2. Analysis of Kafka’s persistence principle

A topic is divided into multiple partitions. Each partition is an append only log file at the storage level. Messages belonging to a partition are directly appended to the tail of the log file. The position of each message in the file is called offset.

Message persistence of Kafka

As shown in the figure below, we created mytopic1 with three partitions. We can go to the corresponding log directory to view.

Message persistence of Kafka

Kafka logs are divided into index and log (as shown in the figure above), which appear in pairs: index file stores metadata and log stores messages. The index file metadata points to the migration address of message in the corresponding log file; For example, 2128 refers to the second data in the log file, and the offset address is 128; The physical address (specified in the index file) + offset address can locate the message.
We can use Kafka’s own tool to view the data information in the log file

Message persistence of Kafka

Message persistence of Kafka

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]