“I want to enter the big factory” Kafka deadly serial 11 questions


Why not? First place in reading novels. No, I wrote a good article.

Recently, I sorted out the article catalogue, because a brother told me long ago that I couldn’t find the article before, and I didn’t bother to sort it out. Now I’ve done a good job. I found that I left an article half written, and took the time to make it up. It’s probably hard to think of where to gather so many problems before I left it???

Look here for the article:Article catalogue

Tell me your understanding of Kafka

Kafka is a streaming data processing platform. It has the ability of message system and real-time streaming data processing and analysis, but we prefer to use it as a message queue system.

If it is easy to understand, it can be roughly divided into three layers:

The first floor isZookeeper, which is equivalent to the registration center. It is responsible for the management of Kafka cluster metadata and the coordination of the cluster. When each Kafka server starts, it connects to zookeeper and registers itself with zookeeper

The second layer is the core layer of Kafka, which contains many basic concepts of Kafka:

record: represents a message

topicTopic: messages are organized in a topic mode, which can be understood as a classification of messages

producer: producer, responsible for sending messages

consumer: consumer, responsible for consumer information

broker: Kafka server

partition: partition, the topic will be composed of multiple partitions. Usually, the messages in each partition are read in order. Different partitions cannot guarantee the order. Partition, that is, the data sharding mechanism, is mainly used to improve the scalability of the system. Through partition, the load of message reading and writing can be balanced to multiple different nodes

Leader/Follower: copy of the partition. In order to ensure high availability, partitions will have some replicas. Each partition will have a leader master replica that is responsible for reading and writing data. Follower slave replica is only responsible for keeping data synchronized with leader replica and does not provide any external services

offset: offset. Each message in the partition will have an increasing sequence number according to the time sequence. This sequence number is the offset offset

Consumer group: consumer group, which is composed of multiple consumers. Only one consumer in a group will consume messages in one partition

Coordinator: coordinator, which is mainly used to allocate partitions and rebalance for consumer groups

Controller: the controller is actually just a broker, which is used to coordinate and manage the entire Kafka cluster. It will be responsible for the division leader election, theme management and other work. The first person who creates a temporary node / controller in zookeeper will become the controller

The third layer is the storage layer, which is used to save the core data of Kafka. They will finally write to the disk in the form of logs.

Does the message queuing model know? How does Kafka support these two models?

For traditional message queuing systems, two models are supported:

  1. Peer to peer: that is, a message can only be consumed by one consumer. After consumption, the message is deleted
  2. Publish and subscribe: equivalent to broadcast mode, messages can be consumed by all consumers

As mentioned above, Kafka actually supports both models through the consumer group.

If all consumers belong to one group and messages can only be consumed by one consumer in the same group, it is the point-to-point mode.

If each consumer is a separate group, it is the publish subscribe mode.

In fact, Kafka flexibly supports these two models by grouping consumers.

Can you talk about the principle of Kafka communication process?

  1. First, when Kafka broker starts, it will register its own ID with zookeeper (create temporary node). This ID can be configured or generated automatically. At the same time, it will subscribe to zookeeper’s IDbrokers/idsPath. When a new broker joins or exits, you can get all the current broker information
  2. It will be specified when the producer startsbootstrap.servers, Kafka will create a TCP connection with these brokers through the specified broker address (usually we don’t need to configure all the broker server addresses, otherwise Kafka will establish a TCP connection with all the configured brokers)
  3. Connect to any broker, and then send a request to obtain metadata information (including topics, partitions of topics, replicas of partitions, leader replicas of partitions, etc.)
  4. TCP connections to all brokers will then be created
  5. Then there is the process of sending messages
  6. Consumers, like producers, specifybootstrap.serversProperty, and then select a broker to create a TCP connection and send a request to find itcoordinator Broker
  7. Then create a TCP connection with the coordinator broker to obtain metadata
  8. Create connections with these brokers according to the broker node where the partition leader node is located
  9. Finally start consuming news

So how to select a partition when sending a message?

There are two main ways:

  1. Polling, sending messages to different partitions in sequence
  2. Random send to a partition

If the message specifies a key, the message will be hashed according to the key of the message, and then the number of partition partitions will be modeled to determine which partition to fall on. Therefore, messages with the same key will always be sent to the same partition, which is also known as message partition ordering.

A common scenario is that we want the order and payment messages to be in order. In this way, sending messages with the order ID as the key achieves the purpose of partition order.

If no key is specified, the default polling load balancing policy will be implemented. For example, the first message falls on P0, the second message falls on P1, and then the third message falls on P1.

In addition, for some specific business scenarios and requirements, you can also implementPartitionerInterface, rewritingconfigureandpartitionMethod to achieve the effect of custom partition.

OK, then why do you think you need zoning? What are the benefits?

This problem is very simple. If we don’t partition, the data we send messages and write can only be saved to one node. In this way, even if the server node has good performance, it can’t support it in the end.

In fact, distributed systems are faced with this problem. Either data segmentation after receiving the message or segmentation in advance. Kafka chose the former, and the data can be evenly distributed to different nodes through zoning.

Partitioning brings the ability of load balancing and scale out.

When sending messages, they can fall on different Kafka server nodes according to the number of partitions, which improves the performance of concurrent message writing. When consuming messages, they are bound to consumers. Messages can be consumed from different partitions of different nodes, which improves the ability to read messages.

Another is the introduction of replicas in partitions. Redundant replicas ensure the high availability and high persistence of Kafka.

Talk about consumer groups and consumer rebalancing in detail?

Generally speaking, the number of consumers in Kafka should be consistent with the number of all topic partitions (for example, if you use one topic, you can subscribe to multiple topics of course).

When the number of consumers is less than the number of partitions, there must be a message that one consumer consumes multiple partitions.

When the number of consumers exceeds the number of partitions, there must be consumers who have no partitions to consume.

Therefore, on the one hand, the benefits of the consumer group are mentioned above. It can support a variety of message models. On the other hand, it can support horizontal expansion and expansion according to the consumption relationship between consumers and partitions.

When we know how consumers consume partitions, it is obvious that there will be a problem. How to allocate the partitions of consumers’ consumption and what to do when there are consumers who join first?

The rebalancing process of the old version is mainly triggered by ZK listener, and each consumer client executes the partition allocation algorithm by itself.

The new version is completed through the coordinator. Every time a new consumer joins, a request will be sent to the coordinatorcoordinator To obtain the partition allocation, and the algorithm logic of this partition allocation is completed by the coordinator.

Rebalancing refers to the situation where new consumers join. For example, at the beginning, only consumer a is consuming messages. After a period of time, consumers B and C join, and the partition needs to be redistributed. This is rebalancing, which can also be called rebalancing, but the process of rebalancing is very similar to that of STW in GC, which will cause the whole consumer group to stop working, and messages cannot be sent during rebalancing.

In addition, this is not the only case of rebalancing, because there is a binding relationship between consumers and the total number of partitions. As mentioned above, the number of consumers should be the same as the total number of partitions of all topics.

Then as long asNumber of consumersNumber of topics(e.g. regular subscription topics)Number of partitionsAny change will trigger the rebalancing.

Let’s talk about the process of weight balance.

The mechanism of rebalancing depends on the heartbeat between the consumer and the coordinator. The consumer will have an independent thread to send the heartbeat to the coordinator regularly, which can be determined by parametersheartbeat.interval.msTo control the interval between sending heartbeats.

  1. Each consumer will send a message to the coordinator when they join the group for the first timeJoinGroupRequest, the first consumer to send this request will become the “group leader”, and the coordinator will return the group member list to the group leader

  2. The group leader executes the partition allocation strategy, and then passes the allocation result throughSyncGroupThe request is sent to the coordinator, who receives the partition allocation result

  3. Other members of the group also send messages to the coordinatorSyncGroup, the coordinator responds to each consumer’s partition allocation separately

Can you tell me more about the partition allocation strategy?

There are three main allocation strategies:


I don’t know how to translate. This is the default strategy. The general meaning is to sort the partitions. The higher the ranking, the more partitions can be allocated to more partitions.

For example, there are three partitions. Consumer a ranks higher, so it can be allocated to two partitions P0 \ P1, and consumer B can only be allocated to one P2.

If there are four partitions, they will all be allocated to two.

However, there will be some small problems with this allocation strategy. It is allocated according to topics. Therefore, if the consumer group subscribes to multiple topics, it may lead to uneven partition allocation.

For example, P0 \ P1 of the two topics in the figure below are assigned to a, so a has four partitions and B has only two. If the number of such topics is more, the imbalance will be more serious.


That is what we often call polling. This is relatively simple. You can easily understand it without drawing.

This will poll and allocate according to all topics. There will be no problem that the more topics in range may lead to uneven partition allocation.

P0->A,P1->B,P1->A。。。 and so on


This literally means sticky strategy, which is probably what it means. The main consideration is to make smaller changes to the allocation of partitions on the premise of balanced allocation.

For example, P0 \ P1 was previously assigned to consumer a, so next time try to assign it to consumer a.

The advantage of this is that the connection can be reused. To consume messages, you always have to connect with the broker. If you can maintain the last allocated partition, you don’t have to destroy and create connections frequently.

Come on! How to ensure message reliability?

Basically, we should elaborate on the guarantee of message reliability from three aspects (this is more comprehensive and impeccable)

The message sent by the producer is lost

Kafka supports three ways to send messages, which are also the conventional three ways. After sending, basically all message queues play this way regardless of the result, synchronous sending and asynchronous sending.

  1. Send and forget. Call the send method directly. Regardless of the result, although automatic retry can be started, there is a possibility of message loss
  2. Send synchronously. Send synchronously and return the future object. We can know the sending result and then process it
  3. Send messages asynchronously, specify a callback function at the same time, and process the results accordingly

To be on the safe side, we usually send messages asynchronously with callback, and then set the parameter to try again and again when the message fails.

acks=all, this parameter can be configured with 0 | 1 | all.

0 means that the producer writes a message. Regardless of the response of the server, the message may still be in the network buffer. The server does not receive the message at all. Of course, the message will be lost.

1 means that at least one replica is considered successful after receiving the message. One replica must be the leader replica of the cluster. However, if the node where the leader replica is located hangs and the follower does not synchronize the message, the message is still lost.

If all is configured, it means that all ISRs are successfully written. The message will not be lost unless all copies in ISRs are hung up.

retries=N, set a very large value to make the producer try again after sending the message failed

Kafka own message missing

Kafka because message writing is asynchronously written to disk through pagecache, there is still the possibility of message loss.

Therefore, for the possible setting parameters lost by Kafka itself:

replication.factor=N, set a relatively large value to ensure at least 2 or more copies.

min.insync.replicas=N, which represents how the message can be considered as successfully written. Set the number greater than 1 to ensure that at least one or more copies are written.

unclean.leader.election.enable=false, this setting means that a partition copy that is not fully synchronized cannot become a leader copy. If yestrueIn this case, there is a risk of message loss after the replica of the leader that is not fully synchronized becomes the leader.

Consumer message lost

The possibility of consumers losing is relatively simple. Just turn off the automatic submission and change to the successful manual submission of business processing.

Because when rebalancing occurs, consumers will read the offset of the last submission. Automatic submission is once every 5 seconds by default, which will lead to repeated consumption or loss of messages.

enable.auto.commit=false, set to manual submission.

There is another parameter that we may also need to take into account:

auto.offset.reset=earliest, this parameter represents how the consumer handles when there is no offset to submit or there is no offset on the broker.earliestIt refers to reading from the beginning of the partition. Messages may be read repeatedly, but they will not be lost. Generally, for consumers, we must ensure idempotence by ourselves. AnotherlatestIndicates that if you read from the end of the partition, there will be a probability of losing messages.

By integrating these parameter settings, we can ensure that messages will not be lost and reliability.

OK, let’s talk about replica and its synchronization principle?

As mentioned earlier, Kafka replica is divided into leader replica and follower replica, that is, master replica and slave replica. Unlike other mysql, only leader replica in Kafka will provide external services, and follower replica only keeps data synchronization with leader as data redundancy and disaster recovery.

In Kafka, we collectively refer to the collection of all copies asAR(Assigned Replicas), a collection of replicas that are synchronized with the leader replica is calledISR(InSyncReplicas)

ISR is a dynamic set. Maintaining this set will pass throughreplica.lag.time.max.msParameter, which represents the maximum time to lag behind the leader copy. The default value is 10 seconds. Therefore, as long as the follower copy does not lag behind the leader copy for more than 10 seconds, it can be considered to be synchronized with the leader (simply, it can be considered to be the synchronization time difference).

There are also two key concepts for synchronization between replicas:

HW(High Watermark): high water level, also known as replication point, indicates the location of synchronization between replicas. As shown in the figure below, 04 green indicates the submitted messages. These messages have been synchronized between replicas. Consumers can see these messages and consume them. 46 yellow indicates uncommitted messages, which may not be synchronized between replicas. These messages are invisible to consumers.

LEO(Log End Offset): the displacement of the next message to be written


<figcaption style=”margin-top: 5px; text-align: center; color: #888; font-size: 12px;”>hw</figcaption>

The synchronization process between replicas depends on the updates of HW and Leo. Their value changes are used to demonstrate the process of replica synchronization messages. Green represents the leader replica and yellow represents the follower replica.

First, the producer keeps writing data to the leader. At this time, the leader’s Leo may have reached 10, but the HW is still 0. The two followers request the leader to synchronize data, and their values are 0.

Then, the message continues to be written, and the leader’s Leo value changes again. The two followers also pull their own messages, so they update their Leo value, but the leader’s HW remains unchanged at this time.

At this time, the follower pulls data from the leader again. At this time, the leader will update his HW value and take the minimum Leo value in the follower to update.

After that, the leader responds to his own HW to the follower, who updates his HW value. Because the message is pulled again, the Leo is updated again, and so on.

Do you know why the new version of Kafka abandoned zookeeper?

I think we can answer this question from two aspects:

First of all, from the complexity of operation and maintenance, Kafka itself is a distributed system, and its operation and maintenance is already very complex. In addition, it also needs to rely heavily on another ZK, which is a great workload for cost and complexity.

Secondly, we should consider the performance problems. For example, the previous operations of submitting displacement are saved in ZK, but ZK is actually not suitable for this high-frequency read-write update operation, which will seriously affect the performance of ZK cluster. On this hand, Kafka also handled submitting and saving displacement in the way of message in the later new version.

In addition, Kafka relies heavily on ZK for metadata management and cluster coordination. If the cluster has a large scale and a large number of topics and partitions, it will lead to too much metadata in ZK cluster and too much cluster pressure, which will directly affect the delay or loss of many watches.

OK, the last question we all ask, why is Kafka fast?

Hey, I’ll pay for this. I’ve recited it many times! There are three main aspects:

Sequential IO

Kafka writes messages to the partition in an additional way, that is, sequential writing to the disk, not random writing. This speed is much faster than ordinary random IO, which is almost comparable to the speed of network io.

Page cache and zero copy

Kafka writes the message data through MMAP memory mapping. Instead of writing to the disk immediately, Kafka uses the file cache pagecache of the operating system to write asynchronously, which improves the performance of writing messages. In addition, Kafka uses MMAP memory mapping when consuming messagessendfileZero copy is realized.

I have written about MMAP and sendfile zero copy. You can see here:Ali Er Mian: what is MMAP?

Batch processing and compression

Kafka does not send messages one by one, but will combine multiple messages into one batch for processing and sending. It is also a truth to consume messages. It pulls one batch of messages for consumption at a time.

Moreover, producer, broker and consumer all use the optimized compression algorithm. The compression of sending and messages saves the overhead of network transmission, and the compression of broker storage reduces the space of disk storage.