Push or pull message queue, how do rocketmq and Kafka do?

Time:2020-11-24

In every era, people who can learn will not be treated badly

Hello, I’m yes.

Today, let’s talk about the push-pull mode of message queuing, which is also a hot topic in the interview. For example, if you write rocketmq in your resume, you will basically ask whether rocketmq is in push mode or pull mode? Pull mode? Isn’t there a pushconsumer?

Today, let’s talk about push-pull mode, and let’s see how rocketmq and Kafka do it.

Push pull mode

First of all, let’s make it clear which step of message queuing is discussed in push-pull mode. Generally speaking, we are talking about itPush pull mode refers to the interaction between comsumer and broker

By default, the producer and the broker push the message to the broker instead of the broker actively pulling the message.

Imagine that if a broker is required to pull messages, the producer must save the messages locally in the form of logs to wait for the broker to pull them. If there are many producers, the reliability of messages depends not only on the broker itself, but also on hundreds of producers.

The broker can also rely on mechanisms such as multiple copies to ensure the reliable storage of messages. However, the reliability of hundreds of producers is a little difficult, so the default producers push messages to the broker.

Therefore, in some cases, it is better to distribute, but sometimes centralized management is better.

Push mode

Push mode refers to that the message is pushed from the broker to the consumer, that is, the consumer receives the message passively, and the broker dominates the sending of the message.

Let’s think about the benefits of push mode?

High real time messageAfter the broker receives the message, it can immediately push it to the consumer.

It’s easier for consumers to useIt’s easy. Just wait. Anyway, when news comes, it will be pushed.

What are the disadvantages of push mode?

The push rate is difficult to adapt to the consumption rateThe goal of push mode is to push messages as fast as possible. When the rate of messages sent by producers to broker is higher than that of consumers’ consumption, as time goes by, consumers may be “out of stock” because they can’t afford to consume at all. When the push rate is too fast, like DDoS attacks, consumers are stupid.

In addition, the consumption rate of different consumers is not the same. As a broker, it is difficult to balance the push rate of each consumer. If you want to achieve an adaptive push rate, you need to tell the broker that I can’t do it. Then the broker needs to maintain the status of each consumer and change the push rate.

This actually increases the complexity of the broker itself.

Therefore, the push mode is difficult to control the push rate according to the state of consumers, and it is suitable for the situation of small amount of messages, strong consumption ability and high real-time performance.

Pull mode

Pull mode refers to that the consumer actively requests the broker to pull messages, that is, the broker passively sends messages to the consumer.

Let’s think about the benefits of drop-down mode?

The initiative of pull mode lies in consumers,Consumers can request to pull messages according to their own situation。 Assuming that the current consumers feel that they can’t afford to spend, they can stop pulling according to certain strategies, or interval fetching is OK.

In pull mode, broker is relatively easyIt only stores the information sent by the producer. As for the consumption, it is initiated by the consumer on its own initiative. If you want to get the information, you can tell it how many consumers take it. It’s just a tool person who has no feelings. If the consumer doesn’t come to get it, it doesn’t matter.

Pull mode is more suitable for batch sending of messagesBased on the push mode, you can push a message as soon as it comes, or you can cache some messages before pushing. However, when pushing, you don’t know whether consumers can handle so many messages at one time. The pull mode is more reasonable. It can refer to the information requested by consumers to determine how many messages are cached and then send them in batches.

What are the disadvantages of pull mode?

Message delayAfter all, it’s the consumers who get the news, but how do consumers know that the news is coming? Therefore, it can only pull continuously, but it can’t request frequently. If it is too frequent, the consumer will attack the broker. Therefore, you need to reduce the frequency of requests. For example, if you ask every 2 seconds, you will probably be delayed by 2 seconds.

Message busy requestBusy requests are, for example, messages that take several hours to arrive. Then, within a few hours, the consumer’s requests are invalid and useless.

Push or pull message queue, how do rocketmq and Kafka do?

Is that push or pull

You can see that push mode and die type have their own advantages and disadvantages. How to choose?

Rocketmq and Kafka both choose the pull mode. Of course, there are push based message queues in the industry, such as ActiveMQ.

Personally, I think the pull mode is more appropriate because the message queue now has the requirement of persisting messages. That is to say, it has a storage function. Its mission is to receive messages and save good news so that consumers can consume messages.

There are various kinds of consumers. As a broker, you should not have the tendency to rely on consumers. I have saved good news for you. You can take it as soon as you want.

Although generally speaking, broker will not become a bottleneck because the consumption of business on the consumer side is relatively slow, after all, the broker is a central point, and it can be as light as possible.

So rocketmq and Kafka both choose pull mode. Are they not afraid of the disadvantages of pull mode? Afraid, so they operate a wave, reducing the shortcomings of pull mode.

Long polling

Rocketmq and Kafka both use “long polling” to implement pull mode. Let’s see how they operate.

In order to simplify, I will describe the number and total size of the messages that are not satisfied this time as there are no messages yet. In any case, the conditions are not satisfied.

Long polling in rocketmq

Pushconsumer in rocketmq is actually a pull mode method,It just looks like push mode

Because rocketmq secretly helped us to request data from the broker.

There will be a rebalanceservice thread in the background. This thread will perform load balancing according to the number of queues in the topic and the number of consumers in the current consumption group. The pullrequests generated by each queue will be put into the blocking queue pullrequest queue. Then there is a pullmessage service thread to continuously get the pullrequest from the blocking queue pullrequest queue, and then through the network request broker to achieve quasi real-time pull messages.

I don’t want to cut this part of the code. That’s what I’ll show you later.

Then, the processrequest method in the pullmessage processor of the broker is used to process the pull message request. If there is a message, it will be returned directly. What if there is no message? Let’s take a look at the code.

Push or pull message queue, how do rocketmq and Kafka do?

Let’s take a look at what the suspend pullrequest method does.

Push or pull message queue, how do rocketmq and Kafka do?

The pullrequestholdservice thread will fetch the pullrequest request from the pullrequesttable every 5 seconds, and then check whether the offset of the pull message request is less than the maximum offset of the current consumption queue. If the condition is true, it indicates that there is a new message, notifymessage arriving will be called, and finally the executerequestwhenwakeup() of pullmessage processor will be called Method to try again to process the request for this message, that is, to do it again. The default time for the long polling is 30 seconds.

Push or pull message queue, how do rocketmq and Kafka do?

In short, the message will be checked once in 5 seconds. If it is, processrequest will be called to process it again. It doesn’t seem to be real-time? Five seconds?

Don’t worry. There is also a reputmessageservice thread, which is used to continuously parse data from commitlog and distribute requests to build two types of data: consummequeue and indexfile,There will also be wake-up requests to make up for such a slow delay every 5 seconds

I will not intercept the code, that is, the message is written and will call pullrequestholdservice ᦇ notifymessage arriving.

Finally, I’ll draw a diagram to describe the whole process.

Push or pull message queue, how do rocketmq and Kafka do?

Long polling in Kafka

For example, Kafka has parameters in the pull request, which can make the consumer request block waiting in the “long polling”.

To put it simply, the consumer goes to the broker to pull the message, and defines a timeout time. That is to say, the consumer requests the message and returns the message immediately if there is one. If there is no message, the consumer will wait until the timeout, and then initiate the message pull request again.

In addition, the broker has to cooperate. If a consumer requests to come over, if there is a message, it must return immediately. If there is no message, it is necessary to establish a delay operation and return after the conditions are met.

Let’s take a simple look at the source code, in order to highlight the point, I will delete some code.

Let’s look at the consumer code first.

Push or pull message queue, how do rocketmq and Kafka do?

The above poll interface must be very familiar to all of us. In fact, we can directly know from the annotation that it is really waiting for the arrival of data or timeout. Let’s take a simple look.

Push or pull message queue, how do rocketmq and Kafka do?

Let’s look at the end client.poll What is called.

Push or pull message queue, how do rocketmq and Kafka do?

lastKafka wrapped selector is called, and Java NiO’s select (timeout) will be called eventually

Now the code on the consumer side is clear,Let’s take a look at how broker does it

The broker’s entry to handle all requests is actually introduced in the previous article KafkaApis.scala Under the handle method of the file, the protagonist of this time is handlefetchrequest.

Push or pull message queue, how do rocketmq and Kafka do?

This method comes in, and I intercept the most important part.

Push or pull message queue, how do rocketmq and Kafka do?

The following picture is the internal implementation of the fetchmessages method. The annotation given by the source code is very clear. Please zoom in and have a look at it.

Push or pull message queue, how do rocketmq and Kafka do?

This purgatory name is very interesting. In short, it uses the time wheel mentioned in my previous article to perform timed tasks. For example, here isdelayedFetchPurgatory, which is specifically used to handle delayed pull operations.

Let’s briefly think about the methods that need to be implemented for this delay operation. First, the delay operation constructed needs to have a check mechanism to check whether the message has arrived. Then, there must be a method to be executed after the message arrives, a method to do after the execution is completed, and of course, a method of what to do after the timeout.

In fact, these methods correspond to the delayedfetch in the code. This class inherits the delayedoperation, and the internal contents are as follows:

  • Iscompleted method to check whether the condition is satisfied
  • Methods executed after the trycomplete condition is met
  • The method of calling after onComplete is executed.
  • Methods to be executed after onexpiration has expired

Judging whether it is overdue is driven by the time wheel, but you can’t wait until the expiration time to see whether the message has arrived, right?

The mechanism of Kafka and rocketmq is the same as that of rocketmq. When the message is written, Kafka will remind these delayed request messages. I will not paste the specific code. We can see it in two more methods in the replicamanager ා appendrecords method.

I don’t want to paste the code.

Push or pull message queue, how do rocketmq and Kafka do?

To sum up

We can see that rocketmq and Kafka both adopt the “long polling” mechanism. The specific method is to wait for messages through consumers, and broker when there is a message The message will be returned directly. If there is no message, the strategy of delaying processing will be adopted. In order to ensure the timeliness of the message, when the corresponding queue or partition has a new message, it will remind the message to come and return the message in time.

In a word, the consumer and the broker cooperate with each other to hold the message when the request fails to meet the conditions, thus avoiding multiple frequent pull actions and reminding the user to return as soon as the message arrives.

last

Generally speaking, push-pull mode has its own advantages and disadvantages, and I personally think that the pull-down mode is more suitable for message queuing.

After reading this article, I believe that the interviewer asked you to push or pull? Suggest giving him a crooked smile.

Push or pull message queue, how do rocketmq and Kafka do?


I’m yes, from a little bit to a billion. See you next time