Summary of Kafka knowledge points


1. What is Kafka

Kafka is a multi partition, multi replica, distributed message queue based on publish / subscribe mode. At present, Kafka has been positioned as a distributed streaming processing platform. It is widely used because of its high throughput, persistence, horizontal scalability, support for streaming data processing and other characteristics.

2. Kafka architecture

Summary of Kafka knowledge points

Kafka’s overall architecture diagram contains several concepts:
(1) Zookeeper: zookeeper is responsible for saving broker cluster metadata and electing controllers.
(2) Producer: the message producer is the client that sends messages to Kafka broker.
(3) Broker: an independent Kafka server is called a broker. A cluster consists of multiple brokers. A broker can accommodate multiple topics. The broker is responsible for receiving the message from the producer, setting the offset for the message, and storing the message on disk. The broker provides services for consumers, responds to requests to read partitions, and returns messages that have been submitted to the disk.
(4) Consumer: message consumer, the client that fetches messages from Kafka broker.
(5) Consumer group: consumer group. A consumer group can contain one or more consumers. Each consumer in the consumer group is responsible for consuming data in different partitions. A partition can only be consumed by consumers in one group. Consumer groups do not affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber. Using the multi partition + multi consumer method can greatly improve the processing speed of data downstream. Consumers in the same consumer group will not consume messages repeatedly. Similarly, consumers in different consumer groups will not affect each other when consuming messages. Kafka implements message P2P mode and broadcast mode through consumer group.
(6) Topic: messages in Kafka are divided by topic, which can be understood as a queue. The producer sends the message to a specific topic, and the consumer is responsible for subscribing to and consuming the message of the topic.
(7) Partition: in order to achieve scalability, a very large topic can be distributed to multiple brokers (servers). A topic can be divided into multiple partitions, and each partition is an ordered queue. The messages contained in different partitions under the same topic are different. The partition can be regarded as an appendable log file at the storage level, and the messages are stored
When appended to the partition log file, a specific offset will be allocated.
(8) Offset: each message in the partition will be assigned an ordered ID, that is, offset. Offset does not span partitions, that is, Kafka guarantees partition ordering rather than topic ordering.
(9) Replica: replica. In order to ensure that the partition data on a node in the cluster will not be lost when it fails, and Kafka can still continue to work, Kafka provides a replica mechanism. Each partition of a topic has several replicas, one leader and several followers. Usually, only the leader replica provides external read-write services. When the broker where the primary replica is located crashes or network exceptions occur, Kafka will re select a new leader replica under the management of the controller to provide external read-write services.
(10) Record: the message record actually written to Kafka and can be read. Each record contains key, value and timestamp.
(11) Leader: the “master” copy of multiple copies in each partition, the object of data sent by the producer, and the object of consumer consumption data are all leaders.
(12) Follower: the “slave” replica among multiple replicas in each partition, which synchronizes data from the leader in real time and keeps synchronization with the leader data. When a leader fails, a follow will become a new leader.
(13) ISR (in sync replicas): Replica synchronization queue, which represents the collection of replicas that are synchronized with the leader (including the leader itself). If the follower does not synchronize data with the leader for a long time, the replica will be kicked out of the ISR queue. If the leader fails, a new leader will be elected from the ISR.
(14) OSR (out of sync replicas): the replica kicked out of ISR due to too high synchronization delay has an OSR.
(15) AR (assigned replicas): all replica sets, that is, ar = ISR + OSR.

3. There are so many publish and subscribe message systems, why choose Kafka? (Kafka features)

(1) Multiple producers
Kafka can seamlessly support multiple producers, whether the client uses one topic or multiple topics. Kafka is suitable for collecting data from multiple front-end systems and providing data out of the heap in a unified format.

(2) Multiple consumers
Kafka supports multiple consumers to read data from a single message flow, and consumers do not affect each other. This is different from other queue systems. Once other queue systems are read by the client, other clients can no longer read it. And multiple consumers can form a consumer group. They share a message flow and ensure that the consumer group consumes each given message only once.

(3) Disk based data storage (persistence, reliability)
Kafka allows consumers to read messages in non real time because Kafka submits messages to disk and sets retention rules for saving without worrying about message loss.

(4) Scalability
Multiple brokers can be extended. Users can use a single broker first, and then expand to multiple brokers.

(5) High performance (high throughput, low latency)
Kafka can easily handle millions of message flows, while ensuring sub second message delay. The message persistence capability is provided with a time complexity of O (1), which can ensure the access performance of constant time complexity even for data above TB. Even on very cheap commercial machines, a single machine can support the transmission of more than 100k messages per second. Kafka writes to the disk sequentially, so the efficiency is very high. After verification, the efficiency of sequential writing to the disk is higher than that of random writing to memory, which is a very important guarantee for Kafka’s high throughput.

The comparison is shown in the figure:
Summary of Kafka knowledge points

4. How does Kafka achieve high throughput / high performance?

Kafka achieves high throughput and performance mainly through the following points:

1. Page caching technology
Kafka is based onoperating systemPage cache to achieve file writing. The operating system itself has a layer of cache calledpage cache, is a cache in memory, which we can also callos cache, which means the cache managed by the operating system itself. Kafka can write this file directly when writing disk filesos cacheIn, that is, only write to memory, and then the operating system decides when to write to memoryos cacheThe data in is really brushed into the disk file. Through this step, the write performance of disk files can be greatly improved, because in fact, it is equivalent to writing memory, not disk.

2. Disk sequential write
Another main function is that Kafka writes data in disk order. In other words, only append the data to the end of the log file, not modify the data at a random location of the file. For the same disk, sequential write can reach 600m / s, while random write is only 100k / s. This is related to the mechanical mechanism of the disk. The reason why sequential writing is fast is that it saves a lot of head addressing time.

Based on the above two points, Kafka realizes the ultra-high performance of writing data.

3. Zero copy
As we all know, data is often consumed from Kafka, so when consuming, it is actually to read a piece of data from Kafka’s disk file and send it to downstream consumers, as shown in the figure below:
Summary of Kafka knowledge points
If data is frequently read from the disk and then sent to consumers, two unnecessary copies will be added, as shown in the following figure:
Summary of Kafka knowledge points
Once, it is copied from the operating system cache to the application process cache, and then copied back from the application cache to the operating system socket cache. In addition, in order to make these two copies, several context switches have taken place between them. One moment, the application is executing, and the other moment, the context is switched to the operating system for execution. Therefore, reading data in this way is more performance consuming.

In order to solve this problem, Kafka introduces zero copy technology when reading data.

In other words, the data in the operating system cache is directly sent to the network card and then transmitted to the downstream consumers. The step of copying the data twice is skipped. Only one descriptor will be copied in the socket cache in the past, and the data will not be copied to the socket cache, as shown in the following figure:
Summary of Kafka knowledge points
Through the zero copy technology, there is no need to copy the data in the OS cache to the application cache, and then from the application cache to the socket cache. Both copies are omitted, so it is called zero copy. The socket cache simply copies the descriptor of the data, and then the data is sent directly from the OS cache to the network card. This process greatly improves the performance of reading file data during data consumption. When Kafka reads data from the disk, it will first check whether there is in the OS cache memory. If so, in fact, the data is read directly from the memory. The Kafka cluster is well tuned. The data is directly written to the OS cache, and then read from the OS cache when reading data. It is equivalent to Kafka providing data write and read completely based on memory, so the overall performance will be extremely high.

5. Relationship between Kafka and zookeeper

Kafka uses zookeeper to save the metadata information and consumer information (offset) of the cluster. Kafka cannot work without zookeeper. On zookeeper, there will be a point dedicated to record the broker server list. The node path is / brokers / IDs.

When each broker server is started, it will register with zookeeper, that is, create a node of / brokers / IDS / [0-n], and then write IP, port and other information. The broker creates a temporary node, so once the broker goes online or offline, the corresponding broker node will be deleted, Therefore, the availability of the broker server can be dynamically characterized by the change of the broker node on the zookeeper.

6. Execution flow of the producer sending messages to Kafka

As shown in the figure below:
Summary of Kafka knowledge points

Summary of Kafka knowledge points

(1) When a producer wants to send a message to Kafka, it needs to create a producerrecoder with the following code:

ProducerRecord<String,String> record 
      = new ProducerRecoder<>("CostomerCountry","Precision Products","France");
      }catch(Exception e){

(2) The producerrecoder object will contain the target topic, partition content, and the specified key and value. When sending the producerrecoder, the producer will first serialize the key and value objects into byte arrays, and then transmit them on the network.

(3) When a producer sends a message to a topic, it needs to go through interceptor, serializer and partitioner.

(4) If the message producerrecord does not specify the partition field, it needs to rely on the partition, and calculate the partition value according to the key field. The function of the partitioner is to allocate partitions for messages.

If no partition is specified and the key of the message is not empty, murmur’s hash algorithm (non encrypted hash function with high operation performance and low collision rate) is used to calculate the partition allocation.

If no partition is specified and the key of the message is empty, select a partition by polling.

(5) After the partition is selected, messages will be added to a record batch, and all messages of this batch will be sent to the same topic and partition. Then there will be an independent thread responsible for sending these record batches to the corresponding broker.

(6) After receiving MSG, the leader writes the message to the local log. If it is successfully written to Kafka, a recordmetadata object is returned, which contains topic and partition information, as well as the offset recorded in the partition.

(7) If the write fails, an error exception is returned. After receiving the error, the producer attempts to resend the message. If it still fails several times, the error message is returned.

(8) Followers pull messages from the leader, write them to the local log, and send ack to the leader. After receiving acks from replicas in all ISRs, the leader increases the high water level and sends ACKs to the producer.

7. How does Kafka ensure that messages of corresponding types are written to the same partition?

adoptMessage keyandPartition deviceThe partitioner generates an offset for the key, and then uses the offset to take the module of the topic partition and select the partition for the message, so as to ensure that the message containing the same key will be written to the same partition.

If the producerrecord does not specify a partition and the key of the message is not empty, the hash algorithm (non encrypted hash function with high computing performance and low collision rate) is used to calculate the partition allocation.

If the producer record does not specify a partition and the key of the message is empty, select a partition by polling.

8. Kafka file storage mechanism

Summary of Kafka knowledge points

In Kafka, a topic is divided into multiple partitions, which are composed of multiple smaller segment elements. The representation of partition on the server is folder by folder. Under each partition folder, there will be multiple groups of segments (logical grouping, not real). Each segment corresponds to three files: Log file Index file Timeindex file. Topic is a logical concept, while partition is a physical concept. Each partition corresponds to multiple log files, which store the data produced by producer. The data produced by producer will be continuously appended to the end of the log file, and each data has its own offset. Each consumer in the consumer group will record the offset to which they consumed in real time, so that they can continue to consume from the last position when the error is recovered.

Summary of Kafka knowledge points

Kafka will according to the log segment. Bytes determines the size of a single segment file (log). When the write data reaches this size, a new segment will be created.

9. How to find the corresponding message based on offset?

Each index entry occupies 8 bytes and is divided into two parts:
(1) Relativeoffset: relative offset, which represents the offset of the message relative to baseoffset. It takes 4 bytes (relativeoffset = offset – baseoffset). The file name of the current index file is the value of baseoffset.

For example, if the baseoffset of a log fragment is 32, its file name is 00000000000032 Log, the relativeoffset value of the message with offset = 35 in the index file is 35-32 = 3

(2) Position: the physical address, that is, the corresponding physical location of the message in the log segment file, occupies 4 bytes.

Summary of Kafka knowledge points

Summary of Kafka knowledge points

(1) First find the segment file where the message with offset = 3 is located (find it by dichotomy) and judge it first Whether the index file name offset (baseoffset) is less than 3;
If it is less than, continue to dichotomy with the next one Inde file name offset comparison;
If it is greater than, the last value less than 3 will be returned Index file. What you find here is the first segment file.

(2) In the segment found Index file, subtracted by the found offset The offset of the index file name (relativeoffset = offset – baseoffset), that is, 00000 Index file, the message with offset 3 we want to find is in this The index in the index file is 3 (the index is stored sparsely. Instead of creating an index for every message, it creates an index every 4K or so to avoid taking up too much space in the index file. The disadvantage is that the offset without index cannot be located to the message location at one time, and a sequential scan is required, but the scanning range is very small).

(3) According to the found index with relative offset of 3, it is determined that the physical offset address stored in message is 756.

(4) According to the physical offset address, go to Log file to find the corresponding message

Similarly, what if I want to find the message data corresponding to offset = 8?

(1) First, find the corresponding 00000000000006. 0 of segment according to the binary search method Index index file

(2) Find the position in the corresponding index file according to offset = 8. The position stores an offset 326, which is at 00000000000006 The corresponding message message-8 is found in the log file.

Kafka’s message storage adopts partition, disk sequential read-write, segmentation and sparse index to achieve efficiency. After version 0.9, offset has been directly maintained in the Kafka cluster__ consumer_ Offsets is in the topic.

10. What information is contained in a message sent by producer?

Message byVariable lengthHeaderVariable lengthOpaque key byte array andVariable lengthThe opaque value of consists of an array of bytes.

Summary of Kafka knowledge points

Recordbatch is the storage unit of Kafka data. A recordbatch contains multiple records (that is, a message). The meaning of each field in recordbatch is as follows:

Summary of Kafka knowledge points

A recordbatch can contain multiple messages, that is, the record in the above figure, and each message can contain multiple header information in the form of key value.

11. How Kafka implements message ordering

Producer: the leader copy of the partition is responsible for writing data in the order of first in first out to ensure the order of messages.

Consumer: messages in the same partition can only be consumed by one consumer in a group to ensure orderly consumption in the partition.

Kafka messages in each partition are ordered when written. During consumption, each partition can only be consumed by one consumer in each consumer group, which ensures that consumption is also orderly.

The entire Kafka does not guarantee order. In order to ensure the global order of Kafka, set a producer, a partition and a consumer.

12. What partitioning algorithms does Kafka have?

Kafka contains three partitioning algorithms:

(1) Polling policy

Also known as round robin policy, that is, sequential allocation. For example, if there are three partitions under a topic, the first message is sent to partition 0, the second message is sent to partition 1, the third message is sent to partition 2, and so on. When the fourth message is produced, it will start again.

The polling policy is the partition policy provided by the Kafka Java producer API by default. Polling strategy has excellent load balancing performance. It can always ensure that messages are evenly distributed to all partitions to the greatest extent. Therefore, by default, it is the most reasonable partition strategy and one of the most commonly used partition strategies.

(2) Random strategy

Also known as randomness policy. Random means that we randomly place messages on any partition, as shown in the following figure:

(3) Assign policy by key

Kafka allows you to define a message key for each message, which is called key for short. Once a key is defined for a message, you can ensure that all messages of the same key enter the same partition. Since the message processing under each partition is sequential, as shown in the following figure:

13. Kafka’s default message retention policy

The default message retention policies of broker are divided into two types:
Log fragment through log segment. Bytes configuration (1GB by default)
Log fragment through log segment. MS configuration (7 days by default)

14. How does Kafka implement message replication between individual clusters?

Kafka message responsibility mechanism can only replicate in a single cluster, not between multiple clusters.

Kafka provides a core component called mirrormaker, which contains a producer and a consumer. The two are connected through a queue. When a consumer reads a message from one cluster, the producer sends the message to another cluster.

15. Kafka message acknowledgement (ACK response) mechanism

In order to ensure that the data sent by producer can reliably reach the specified topic, producer provides a message confirmation mechanism. When the producer sends a message to the topic of the broker, it can be configured to determine how many copies receive the message before the message is sent successfully. It can be specified through the acks parameter when defining the producer. This parameter supports the following three values:

(1) Acks = 0: the producer will not wait for any response from the broker.

Features: low latency, high throughput, data may be lost.

If there is a problem and the broker does not receive the message, the producer will not know, and the message will be lost.

(2) Acks = 1 (default): as long as the leader node of the partition in the cluster receives a message, the producer will receive a successful response from the server.

If the leader fails before the follower synchronizes, data will be lost.

At this time, the throughput mainly depends on whether synchronous transmission or asynchronous transmission is used. The throughput is also limited by the number of messages in transmission, such as how many messages a producer can send before receiving a broker response.

(3) Acks = – 1: the producer will receive a successful response from the server only when all nodes participating in the replication receive messages.

This mode is the safest. It can ensure that more than one server receives messages. Even if a server crashes, the whole cluster can still run.

Select and set different acks according to the actual application scenario to ensure the reliability of data.

In addition, producer can also select synchronous or asynchronous mode for sending messages. If it is set to asynchronous, although the performance of message sending will be greatly improved, it will increase the risk of data loss. If you want to ensure the reliability of the message, you must set producer Type is set to sync.

#Synchronous mode
#Asynchronous mode

16. What is a copy?

In order to ensure no data loss, Kafka has introduced the partition replica mechanism since version 0.8.0. Specify replication factor when creating topic. The default copy is 3.

Replica is relative to partition. A partition contains one or more replicas, one of which is the leader replica and the other is the follower replica. Each replica is located in a different broker node.

All read and write operations are performed through the leader, and the follower will regularly copy data on the leader. When the leader dies, one of the followers will become a new leader again. Through partitioned replicas, data redundancy is introduced and Kafka’s data reliability is also provided.

Kafka’s partitioned multi replica architecture is the core of Kafka’s reliability assurance. Writing messages to multiple replicas can ensure the persistence of messages in the event of a crash.

17. Kafka’s ISR mechanism

In the partition, all replicas are collectively referred to as AR, and the leader maintains a dynamic in sync replica (ISR). ISR refers to the replica set that keeps synchronized with the leader replica. Of course, the leader copy itself is also a member of this collection.

After the follower in the ISR completes data synchronization, the leader will send an ACK to the follower. If one of the followers fails to synchronize data to the leader for a long time, the follower will be kicked out of the ISR set. The time threshold is determined by replica log. time. parameter setting. When the leader fails, a new leader will be re elected from the ISR set.

18. What do Leo, HW, LSO and LW stand for respectively?

LEO: is short for logendoffset, which represents the next position in the current log file for each copy.

HW: the word water level or watermark can also be called high watermark. It is usually used in the field of streaming processing (Flink, spark) to represent the progress of elements or events at the time-based level. In Kafka, HW is for zoning, and the concept of water level is not related to time, but related to location information. Strictly speaking, it represents position information, that is, offset. Take the smallest Leo in the ISR corresponding to the partition as the HW, and the consumer can only consume the previous information of the HW at most.

LSO: is the abbreviation of laststableoffset. For incomplete transactions, the value of LSO is equal to the position of the first message in the transaction (firstunstableoffset). For completed transactions, its value is the same as HW.

LW: low watermark low water level, representing the smallest logstartoffset value in the AR set.

Summary of Kafka knowledge points

Summary of Kafka knowledge points

19. Partition redistribution of partition management

When a node in the cluster suddenly goes down and goes offline, if the partitions on the node are single replica, these partitions will become unavailable, and the corresponding data will be lost before the node is restored; If the partition on a node is multi replica, the role of the leader replica on this node will be transferred to other follower replicas of the cluster. In a word, the partition replicas on this node are in the state of functional failure. Kafka will not automatically migrate these failed partition replicas to the remaining available broker nodes in the cluster. If left unchecked, it will not only affect the balanced load of the whole cluster, but also affect the availability and reliability of the overall service.

When a node in the cluster needs to be offline in a planned way, in order to ensure the reasonable allocation of partitions and replicas, we also hope to migrate the partition replicas on this node to other available nodes in some way.

When a broker node is added in the cluster, only the newly created topic partition may be assigned to this node, while the previous topic partition will not be automatically assigned to the newly added node, because there is no new node when they are created, so there is a serious imbalance between the load of the new node and the load of the original node.

In order to solve the above problems, the partition replica needs to be reasonably allocated again, that is, the so-called partition redistribution. Kafka provides Kafka reassign partitions SH script to perform partition reallocation. It can migrate partitions in the scenario of cluster expansion and broker node failure.

kafka-reassign-partitions. The use of SH script is divided into three steps:
(1) First, you need to create a JSON file containing the topic list;
(2) Secondly, a reallocation scheme is generated according to the subject list and the broker node list;
(3) Finally, specific redistribution actions are carried out according to this scheme.

Partition redistribution has a great impact on the performance of the cluster and requires additional resources, such as network and disk. In practice, we will reduce the granularity of redistribution and divide it into several small batches to minimize the negative impact, which is similar to the election of priority copies.

It should also be noted that if you want to offline a broker, you’d better close or restart the broker before performing partition reallocation. In this way, the broker is no longer the leader node of any partition, and its partition can be assigned to other brokers in the cluster. This can reduce the traffic replication between brokers, so as to improve the performance of redistribution and reduce the impact on the cluster.

20. How to conduct partition leader election?

The partition leader replica is elected bycontroller(controller) is responsible for specific implementation. When creating a partition (creating a theme or adding a partition has the action of creating a partition) or when the partition goes online (for example, when the original leader copy of the partition goes offline, the partition needs to elect a new leader to go online to provide external services), the leader election action needs to be executed, and the corresponding election strategy is offlinepartitionleaderelectionstrategy. The basic idea of this strategy is to find the first surviving replica in the order of replicas in the AR set, and this replica is in the ISR set. The AR set of a partition is specified at the time of allocation, and the order of replicas in the set remains unchanged as long as there is no reallocation, while the order of replicas in the ISR set of the partition may change.

be careful: the election is conducted according to the order of AR rather than ISR.

If there is no copy available in the ISR set, check the configured unclean leader. election. Enable parameter (the default value is false). If this parameter is configured as true, it means that the leader is allowed to be elected from the non ISR list, and the first surviving copy found from the AR list is the leader.

When the partition is reallocated, the leader election action also needs to be executed, and the corresponding election strategy is realignpartition leaderelectionstrategy. The idea of this election strategy is relatively simple: find the first surviving copy from the reassigned ar list, and this copy is in the current ISR list.

When the election of priority replica occurs, the corresponding election strategy is preferredreplicapartitionleaderelectionstrategy. Directly set the priority copy as the leader, and the first copy in the AR set is the priority copy.

In another case, the leader election will occur. When a node is gracefully shut down (that is, executing controlledshutdown), the leader replica on the node will be offline. Therefore, the corresponding partition needs to execute the leader election. The corresponding election strategy is: find the first surviving replica from the AR list, and the replica is in the current ISR list. At the same time, ensure that the replica is not on the node being shut down.

21. Kafka affairs

Kafka introduces transaction support in version 0.11, which can ensure that Kafka can produce and consume across partitions and sessions on the basis of exactly once semantics, either all succeed or all fail.

Producer transaction

In order to realize cross partition and cross session transactions, it is necessary to introduce a globally unique transaction ID and bind the PID obtained by the producer with the transaction ID. In this way, after the producer is restarted, the original PID can be obtained through the ongoing transaction ID.

In order to manage transactions, Kafka introduces a new component transaction coordinator. Producer obtains the task status corresponding to the transaction ID by interacting with the transaction coordinator. The transaction coordinator is also responsible for writing the transaction information into an internal topic of Kafka, so that even if the entire service is restarted, the transaction status in progress can be restored because the transaction status is saved, so as to continue.

Consumer transaction

The above transaction mechanism is mainly considered from the producer. For the consumer, the guarantee of transaction is relatively weak, especially the accurate consumption of commit information. This is because the consumer can access any information through offset, and different segment file lifecycles are different. Messages of the same transaction may be deleted after restart.

22. What is the relationship between Kafka’s consumer group and the division?

(1) In Kafka, consumers are managed through consumer groups. Suppose that a topic contains four partitions, and there is only one consumer in a consumer group. The consumer will receive messages from all four partitions.

(2) If there are two consumers, the four partitions will allocate two consumers according to the partition allocation policy.

(3) If there are four consumers, they will be distributed equally, and each consumer will consume a partition.

(4) If there are five consumers, the number of consumers will be more than the number of partitions, and the redundant consumers will be idle and will not receive any information.

23. How to ensure that each application can get all messages in the Kafka topic instead of some messages?

Create a consumer group for each application, and then add consumers to the group to scale the reading and processing power. Each group does not interfere with each other when consuming messages in the topic.

24. How can Kafka consumers consume only a specified number of messages at a time?

Write a queue, take consumer as an attribute of the queue class, and then increase a consumption counter. When the specified number is reached, close consumer.

25. How does Kafka realize multi-threaded consumption?

Kafka allows multiple partitions in the same group to be consumed by one consumer, but does not allow one partition to be consumed by multiple consumers in the same group.

The steps to implement multithreading are as follows:

The producer submits data in random partition (user-defined random partition).
The consumer changes the single thread mode to multithreading. In terms of consumption, pay attention to traversing all partitions, otherwise only one zone is consumed.

Summary of Kafka knowledge points

26. How many consumption modes does Kafka support?

Kafka supports three modes when consuming messages:

(1) At most once mode
At most once. Ensure that each message is committed successfully before consumption processing. Messages may be lost, but they will not be repeated. If the producer does not retry when the ACK times out or returns an error, the message may not be written to Kafka eventually, so it will not be delivered to the consumer. In most cases, this is done to avoid the possibility of duplication. The business must receive the possible loss of data transmission.

(2) At least once mode
At least once. Ensure that each message is processed successfully before committing. Messages are not lost, but may be repeated. If the producer receives an ACK from Kafka broker or acks = all, it indicates that the message has been written to Kafka. However, if the producer ack times out or receives an error, it may retry sending the message, and the client will think that the message is not written to Kafka. If the broker fails before sending the ACK, but after the message is successfully written to Kafka, this retry will cause the message to be written twice, so the message will be delivered to the final consumer more than once. This strategy may lead to repeated work and incorrect results.

(3) Exactly once mode
Accurate transmission once. The offset is processed simultaneously with the message as the unique ID, and the atomicity of the processing is guaranteed. Messages are processed only once and are neither lost nor repeated. But this is hard to do. Even if the producer retries sending the message, the message is guaranteed to be delivered to the final consumer at most once. This semantics is ideal, but it is also difficult to implement because it requires the message system itself to cooperate with the applications that produce and consume messages. For example, if the offset of Kafka consumer is rolled back after the consumption message is successful, we will receive the message from this offset again. This indicates that the messaging system and the client application must be adjusted to achieve excatly once.

Kafka’s default mode is at least once, but this mode may cause the problem of repeated consumption, so idempotent design must be done in business logic.

Insert into is used when saving data in the business scenario The on duplicate key update syntax is inserted when it does not exist and updated when it exists. It naturally supports idempotency.

27. How does Kafka ensure that data is not duplicated or lost (exactly once semantics)?

1. Exactly once mode

Accurate transmission once. The offset is processed simultaneously with the message as the unique ID, and the atomicity of the processing is guaranteed. Messages are processed only once and are neither lost nor repeated. But this is hard to do.

Kafka’s default mode is at least once, but this mode may cause the problem of repeated consumption, so idempotent design must be done in business logic.

2. Idempotency

When producer sends messages during production, it is inevitable that it will send messages repeatedly. When the producer retries, a retry mechanism will be generated, and messages will be sent repeatedly. After introducing idempotency, repeated sending will only generate a valid message.

Specific implementation of idempotency: Each producer will be assigned a unique PID during initialization. This PID is transparent to the application and is not exposed to the user at all. For a given PID, sequence number will increase automatically from 0. When the producer sends data, it will identify each MSG with a sequence number, which is used by the broker to verify whether the data is repeated. The PID here is globally unique. A new PID will be assigned after the restart of the producer after failure, which is also one reason why idempotency cannot cross session. Each topic partition on the broker will also maintain the mapping of PID SEQ, and lastseq will be updated every time it commits. In this way, when the record batch arrives, the broker will check the record batch first and then save the data. If the baseseq (SEQ of the first message) in the batch is 1 greater than the serial number (lastseq) maintained by the broker, the data will be saved, otherwise it will not be saved.

3. At least once + idempotency = exactly once, it can ensure that the data is not repeated or lost.

28. How does Kafka clean up expired data?

Kafka persistent data to the hard disk, allowing you to configure certain policies for data cleaning. There are two cleaning strategies, delete and compression.

Data cleaning method

1. Delete

log. cleanup. Policy = delete enable delete policy

Delete directly. The deleted message cannot be recovered. The following two policies can be configured:

#Cleanup exceeds the specified time:  
#When the specified size is exceeded, delete the old message:

In order to avoid blocking the read operation during deletion, the copy on write form is adopted. When the deletion operation is in progress, the binary search function of the read operation is actually carried out on a static snapshot copy, which is similar to the copyonwritearraylist of Java.

2. Compress

Compress the data and keep only the data of the last version of each key.

First, set log in the broker configuration cleaner. Enable = true enables the cleaner, which is off by default.

Set log in the configuration of topic cleanup. Policy = compact enables the compression policy.

29. Kafka and cap theory

As the basic theory of distributed system, cap theory describes that a distributed system can only meet two of the three at most: consistency, availability and partition tolerance.

1. Meaning of cap

Meaning: all nodes access the same latest data copy (the same at the same time).

Any read operation that starts after the write operation is completed must return this value, or the result of the subsequent write operation. In other words, in the consistency system, once the client writes the value to any server and obtains a response, then the client reads the newly written data from any other server.

Meaning: each request received by a non fault node in the system must be responded to.

In available systems, if our client sends a request to the server and the server does not crash, the server must finally respond to the client and is not allowed to ignore the client’s request.

Partition tolerance
Meaning: when a node or network partition in the distributed system fails, the whole system can still provide services that meet consistency and availability, that is, some failures will not affect the overall use.

In fact, when designing distributed systems, we always take into account the faults caused by bugs, hardware, network and other reasons. Therefore, even if some nodes or networks fail, we require the whole system to continue to use (if we do not continue to use, it is equivalent to only one partition, then there will be no subsequent consistency and availability)

2. Note:
(1) Not at any time, C and a have to give up one. When there is no partition problem, the distributed system should have perfect data consistency and availability.

(2) The selection of C and a is not necessarily for the whole system, but can be carried out in stages and at different times. For example, the accounting flow in the payment subsystem must be highly consistent; A can be selected for related subsystems such as user name, user avatar and user level.

(3) The three characteristics of cap are not Boolean, binary opposition, black or white. They are all scope. For example, when emphasizing consistency, they do not completely abandon usability.

(4)How to weigh cap

CA system: focus on consistency and availability. It requires very strict and consistent protocols, such as “two-phase submission Protocol” (2pc). The CA system cannot tolerate network errors or node errors. Once such a problem occurs, the whole system will refuse the write request because it does not know whether the opposite node is dead or just a network problem. The only safe way is to make yourself read-only.

Unfortunately, this situation hardly exists. Because of distributed system, network partition is inevitable. If you want to abandon P, you want to abandon the distributed system, and cap is out of the question. It can be said that P is the premise of distributed system, so this situation does not exist.

For example, general relational databases, such as MySQL or Oracle, ensure consistency and availability, but they are not distributed systems. From this point of view, cap is not equivalent. We can’t improve P by sacrificing ca. To improve partition fault tolerance, we can only improve the stability of infrastructure. In other words, this is not a software problem.

CP system: focus on consistency and partition tolerance. It focuses on the consistency protocol of most people in the system, such as Paxos algorithm (algorithm of quorum class). Such a system only needs to ensure that the data of most nodes are consistent, and a few nodes will become unavailable when they are not synchronized to the latest version of data. This can provide some availability.

A system ensures consistency and partition fault tolerance, abandoning availability. In other words, in extreme cases, the system is allowed to be inaccessible. At this time, the user experience is often sacrificed to keep the user waiting until the system data is consistent, and then restore the service.

For some systems, consistency is the foundation of settling down. For example, for distributed storage such as HBase and redis, data consistency is the most basic requirement. Storage that does not meet consistency will obviously not be used by users.

Zookeeper is the same. You can get consistent results anytime you visit ZK. Its responsibility is to ensure that the services under its jurisdiction remain synchronized and consistent. Obviously, it is impossible to give up consistency. However, in extreme cases, ZK may discard some requests, and consumers need to re request to obtain results.

AP system: such systems care about availability and partition tolerance. Therefore, such a system cannot reach agreement, and data conflicts need to be given, and data versions need to be maintained if data conflicts are given.

This is the design of most distributed systems to ensure high availability and partition fault tolerance, but at the expense of consistency. For example, Taobao shopping and 12306 ticket purchase. As mentioned earlier, Taobao can achieve an ultra-high level of 5 9s of annual availability, but data consistency cannot be guaranteed at this time.

For example, we often encounter it when we buy tickets at 12306. When we click to buy, the system does not prompt that there is no ticket. We won’t tell you until we enter the verification code and pay. There are no tickets. This is because when we click to buy, the data is not consistent, and the remaining tickets are insufficient only when we check the payment. This design will sacrifice some user experience, but it can ensure high availability, so that users will not be unable to access or wait for a long time. It is also a trade-off.

The key to weighing the three depends on the business.

If consistency is abandoned and partition fault tolerance is satisfied, nodes may lose contact. For high availability, each node can only provide services with local data, which will easily lead to global data inconsistency. For Internet Applications (such as Sina and Netease), the number of machines is huge, nodes are scattered, and network failures are normal. At this time, it is the scenario of ensuring AP and abandoning C. from practice, it is acceptable that there is no consistency occasionally, such as portal websites, but the problem of inaccessibility is very big.

For banks, strong consistency must be guaranteed, that is, C must exist, so only Ca and CP are used. When strong consistency and availability (CA) are guaranteed, the system will be completely unavailable in case of communication failure. On the other hand, if strong consistency and partition fault tolerance (CP) are guaranteed, it has partial availability. What should be selected actually needs to be weighed through the business scenario (CP is not better than Ca in all cases. You can only view information but can’t update information. Sometimes it’s better to refuse service directly).

3. Cap mechanism in Kafka

Kafka satisfies the Ca in cap law, in which partition tolerance adopts a certain mechanism to ensure partition fault tolerance as much as possible. Where C represents data consistency. A indicates data availability.

Kafka first writes the data into different partitions, and each partition may have multiple copies. The data is first written into the leader partition. The read-write operation is to communicate with the leader partition to ensure the data consistency principle, that is, to meet the consistency principle. Then Kafka ensures the availability of data in Kafka through partition replica mechanism. However, there is another problem, that is, the difference between the data in the replica partition and the data in the leader. How to solve this problem is the problem of partition tolerance.

In order to solve the problem of partition tolerance, Kafka uses the synchronization strategy of ISR to minimize the problem of partition tolerance.

Each leader will maintain a list of ISRs (a set of in sync replicas). The main function of ISR list is to determine which replica partitions are available, that is, the data in the leader partition can be synchronized to the replica partition. There are two conditions to determine whether a replica partition is available:

  • replica. lag. time. = 10000. The heartbeat time between the replica partition and the primary partition is delayed. If it exceeds this time, the ISR will be kicked out
  • replica. lag. Max. messages = 4000 means that the number of messages behind the leader of a replica exceeds the value of this parameter, then the leader will delete the follower from the ISR (this parameter is removed in version 0.10.0)

The confirmation value when the produce request is considered complete: request required. acks=0。

30. Why does Kafka not support read-write separation?

In Kafka, the operations of producers writing messages and consumers reading messages interact with the leader copy, so it is aMain write and main readProduction and consumption model. Kafka does not supportMaster write slave read, because master write and slave read have two obvious disadvantages:

(1) Data consistency problem: there must be a delayed time window when data is transferred from the master node to the slave node, which will lead to data inconsistency between the master and slave nodes. At a certain time, the value of a data in the master node and the slave node is x, and then the value of a in the master node is modified to y. before this change is notified to the slave node, the value of a data read by the application from the slave node is not the latest y, resulting in the problem of data inconsistency.

(2) Delay problem: for components like redis, the process from writing data to the master node to synchronizing data to the slave node requires several stages: network → master node memory → network → slave node memory. The whole process will take some time. In Kafka, master-slave synchronization is more time-consuming than redis. It needs to go through the stages of network → master node memory → master node disk → network → slave node memory → slave node disk. For delay sensitive applications, the function of master write and slave read is not suitable.

In practical application, Kafka can achieve load balancing to a great extent in most cases with the ecological platform combining monitoring, alarm and operation and maintenance. Kafka’s write and read initiative has many advantages:

(1) It can simplify the implementation logic of the code and reduce the possibility of error;

(2) The load granularity is refined and evenly distributed. Compared with master write and slave read, the load efficiency is not only better, but also controllable to users;

(3) No delay effect;

(4) When the replica is stable, there will be no data inconsistency.

Therefore, why does Kafka need to realize the function of master write and slave read, which is unprofitable for it? All this benefits from Kafka’s excellent architecture design. In a sense, writing and reading is an expedient measure formed due to design defects.

31. Kafka data offset reading process

(1) Connect the ZK cluster and get the partition information of the corresponding topic and the relevant information of the leader of the partition from ZK

(2) Connect to the corresponding broker of the corresponding leader

(3) The consumer sends the saved offset to the leader

(4) The leader locates the segment (index ⽂ file and log ⽇ file) according to offset and other information

(5) According to the contents of the index ⽂ file, locate the start position corresponding to the offset in the ⽇ log ⽂ file, read the data of the corresponding ⻓ length and return it to the consumer

32. Kafka’s message data is overstocked and Kafka’s consumption capacity is insufficient. How to deal with it?

(1) If Kafka’s consumption capacity is insufficient, consider increasing the number of partitions of topic and increasing the number of consumers in the consumption group. The number of consumers = the number of partitions (both are indispensable).

(2) If the downstream data processing is not timely: increase the pull quantity of each batch.
Too little batch pull data, i.e. pull data / processing time < production speed, so that the processed data is less than the production data, which will also cause data backlog.

33. Kafka offset maintenance

Before Kafka version 0.9, consumer saved offset in zookeeper by default.

Summary of Kafka knowledge points

Starting from version 0.9, by default, the consumer saves the offset in Kafka, a built-in file named:__ consumer_ In the topic of offsets.

In the actual development scenario, in spark and Flink, you can manually submit Kafka’s offset or Flink’s two-stage proposal
Submit offset automatically.