How does go language manipulate Kafka to ensure no message loss


How does go language manipulate Kafka to ensure no message loss


At present, some Internet companies will use message queuing for core business. Because it is a core business, it is sensitive to the final consistency of data. If there is data loss in the middle, it will lead to user complaints, and the performance at the end of the year will become 325. Before chatting with several friends, their companies are using Kafka as a message queue. Will messages be lost when using Kafka? How to take compensation measures if you lose the news? In this article, we will analyze it together and introduce how to use go operationKafkaNo data loss is allowed.

This operation Kafka is based on:

First knowledge of Kafka architecture

Wikipedia’s introduction to Kafka:

Kafka is an open source stream processing platform developed by the Apache Software Foundation and written by Scala and Java. The goal of the project is to provide a unified, high throughput and low latency platform for processing real-time data. Its persistence layer is essentially a “large-scale publish / subscribe message queue according to the distributed transaction log architecture”, which makes it very valuable as an enterprise infrastructure to process streaming data. In addition, Kafka can connect to external systems (for data input / output) through Kafka connect, and provides Kafka streams, a Java] streaming library. This design is greatly influenced by the transaction log.

The overall architecture of Kafka is relatively simple. It is mainly composed of producer, broker and consumer:

How does go language manipulate Kafka to ensure no message loss

Screenshot 10.00.13 am, September 12, 2021

For the architecture diagram, we explain each module:

  • Producer: the producer of data, which can publish the data to the selected topic.
  • Consumer: consumer of data, identified by consumer group. Each record in topic will be assigned to a consumer instance in the subscription consumer group. Consumer instances can be distributed in multiple processes or machines.
  • Broker: message middleware processing node (server). A node is a broker. A Kafka cluster is composed of one or more brokers.

We also introduce some concepts:

  • Topic: it can be understood as a collection of messages. Topics are stored in brokers. A topic can have multiple partition partitions, a topic can have multiple producers to push messages, a topic can have multiple consumers to pull messages, and a topic can exist in one or more brokers.
  • Partition: it is a subset of topics. Different partition assignments are horizontally extended on different brokers to increase Kafka parallel processing capacity. Different partition information under the same topic is different, and the same partition information is orderly; Each partition has one or more replicas, in which a leader will be elected. Fowler pulls data from the leader to update its own log (each partition logically corresponds to a log folder). Consumers pull information from the leader.

Three nodes of Kafka lost message

Producer push message node

Let’s take a look at the general writing process of producer:

  • The producer first finds the leader of the partition from the Kafka cluster
  • The producer sends the message to the leader, who writes the message locally
  • Follwers pull the message from the leader, write it to the local log, and the leader sends an ACK
  • After receiving acks from replicas in all ISRs, the leader adds high watermark and sends ack to the producer

How does go language manipulate Kafka to ensure no message loss

Screenshot: 11.16.43 am, September 12, 2021

Through this process, we can see that Kafka will eventually return an ACK to confirm the push message result. Here, Kafka provides three modes:

  1. NoResponse RequiredAcks = 0 
  2. WaitForLocal RequiredAcks = 1 
  3. WaitForAll RequiredAcks = -1 
  • Noresponse requiredacks = 0: this means that whether the data launch is successful or not has nothing to do with me
  • Waitforlocal requiredacks = 1: it can be returned after the local (leader) confirms that the reception is successful
  • Waitforall requiredacks = – 1: returns only when all leaders and followers receive successfully

Therefore, according to these three modes, we can infer that the producer has a certain probability of losing when pushing messages. The analysis is as follows:

  • If we select mode 1, there is a high probability of data loss in this mode and we cannot retry
  • If we choose mode 2, in this mode, as long as the leader is not hung, we can ensure that the data is not lost. However, if the leader is hung and the follower has not synchronized the data, there will be a certain chance of data loss
  • If mode 3 is selected, this situation will not cause data loss, but may cause data duplication. If the synchronization of data between leader and follower is a network problem, it may cause data duplication.

Therefore, in the production environment, we can choose mode 2 or mode 3 to ensure the reliability of messages. We need to choose according to the business scenario. If we care about throughput, we can choose mode 2. If we don’t care about throughput, we can choose mode 3. If we want to completely ensure that data is not lost, we can choose mode 3, which is the most reliable.

The Kafka cluster is caused by its own failure

After receiving the data, the Kafka cluster will persist and store the data, and finally the data will be written to the disk. Writing to the disk may also cause data loss, because when writing to the disk, the operating system will first write the data to the cache, and the time when the operating system writes the data in the cache to the disk is uncertain, so in this case, If the Kafka machine suddenly goes down, it will also cause data loss. However, this probability is very small. Generally, the Kafka machine in the company will do backup. This situation is very extreme and can be ignored.

Consumer pull message node

When pushing a message, the data will be added to the partition and an offset will be allocated. This offset represents the location where the current consumer consumes. The sequence of messages can also be guaranteed through this partition. After pulling a message, consumers can set automatic submission or manual submission of commit. If the submission of commit is successful, the offset will occur:

How does go language manipulate Kafka to ensure no message loss

Screenshot 3.37.33 PM, September 12, 2021

Therefore, automatic submission will lead to data loss, and manual submission will lead to data duplication. The analysis is as follows:

  • When setting automatic submission, when we pull a message, the offset has been submitted, but we fail to process the consumption logic, which will lead to data loss
  • When setting manual submission, if we submit the commit after processing the message, the commit step fails, which will lead to the problem of repeated consumption.

Compared with data loss, repeated consumption is in line with business expectations. We can avoid this problem through some idempotent designs.

actual combat

The complete code has been uploaded to GitHub: Dream/tree/master/code_ demo/kafka_ demo

Solve the problem of push message loss

It is mainly solved through two points:

  • It is solved by setting the requiredacks mode. Selecting waitforall can ensure successful data push, but it will affect the time delay
  • The retry mechanism is introduced to set the number of retries and retry interval

Therefore, we write the following code (pick out the part of creating the client):

  1. func NewAsyncProducer() sarama.AsyncProducer { 
  2.  cfg := sarama.NewConfig() 
  3.  version, err := sarama.ParseKafkaVersion(VERSION) 
  4.  if err != nil{ 
  5.   log.Fatal(“NewAsyncProducer Parse kafka version failed”, err.Error()) 
  6.   return nil 
  7.  } 
  8.  cfg.Version = version 
  9. cfg.Producer.RequiredAcks  =  sarama.WaitForAll  //  You can choose from three modes
  10.  cfg.Producer.Partitioner = sarama.NewHashPartitioner 
  11.  cfg.Producer.Return.Successes = true 
  12.  cfg.Producer.Return.Errors = true 
  13. cfg.Producer.Retry.Max  =  three  //  Set retry 3 times
  14.  cfg.Producer.Retry.Backoff = 100 * time.Millisecond 
  15.  cli, err := sarama.NewAsyncProducer([]string{ADDR}, cfg) 
  16.  if err != nil{ 
  17.   log.Fatal(“NewAsyncProducer failed”, err.Error()) 
  18.   return nil 
  19.  } 
  20.  return cli 

Solve the problem of pull message loss

This solution is rough. You can directly use the automatic submission mode to manually submit offset after each real consumption, but it will cause the problem of repeated consumption. However, it is easy to solve. You can use idempotent operation to solve it.

Code example:

  1. func NewConsumerGroup(group string) sarama.ConsumerGroup { 
  2.  cfg := sarama.NewConfig() 
  3.  version, err := sarama.ParseKafkaVersion(VERSION) 
  4.  if err != nil{ 
  5.   log.Fatal(“NewConsumerGroup Parse kafka version failed”, err.Error()) 
  6.   return nil 
  7.  } 
  9.  cfg.Version = version 
  10.  cfg.Consumer.Group.Rebalance.Strategy = sarama.BalanceStrategyRange 
  11.  cfg.Consumer.Offsets.Initial = sarama.OffsetOldest 
  12.  cfg.Consumer.Offsets.Retry.Max = 3 
  13. cfg.Consumer.Offsets.AutoCommit.Enable  =  true  //  To enable automatic submission, you need to manually call markmessage
  14. cfg.Consumer.Offsets.AutoCommit.Interval  =  one  *  time.Second  //  interval
  15.  client, err := sarama.NewConsumerGroup([]string{ADDR}, group, cfg) 
  16.  if err != nil { 
  17.   log.Fatal(“NewConsumerGroup failed”, err.Error()) 
  18.  } 
  19.  return client 

The above is mainly about creating a consumergroup. Careful readers should see that we use automatic submission here. What about manual submission? This is because our Kafka library has different characteristics. This automatic submission needs to be used in conjunction with the markmessage () method before submission (friends in doubt can practice it or look at the source code). Otherwise, the submission will fail, because we should write this when writing consumption logic:

  1. func (e EventHandler) ConsumeClaim(session sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error { 
  2.  for msg := range claim.Messages() { 
  3.   var data common.KafkaMsg 
  4.   if err := json.Unmarshal(msg.Value, &data); err != nil { 
  5.    return errors.New(“failed to unmarshal message err is ” + err.Error()) 
  6.   } 
  7. //   To manipulate data, print instead
  8.   log.Print(“consumerClaim data is “) 
  10. //   After the message is processed successfully, it is marked as processing,   Then it will be submitted automatically
  11.   session.MarkMessage(msg,””) 
  12.  } 
  13.  return nil 

Or you can directly use the manual submission method to solve the problem, which requires only two steps:

Step 1: turn off auto submit:

  1. consumerConfig.Consumer.Offsets.AutoCommit.Enable  =  false   //  Disable automatic submission and change to manual submission

Step 2: add the following code to the consumption logic. In the manual submission mode, you also need to mark it first before committing

  1. session.MarkMessage(msg,””) 
  2. session.Commit() 

The complete code can be downloaded from GitHub and verified!


In this paper, we mainly explain two knowledge points:

Kafka will cause message loss

How to configure Kafka without losing data using go operation

In daily business development, many companies like to decouple the message queue, so you should note that using Kafka as the message queue can not ensure that the data is not lost. We need to manually configure the compensation ourselves. Don’t forget, or it will be another P0 accident.