Kafka — metadata pull process


In the previous article, we knew when metadata was pulled. Let’s take a look at the whole pull process.


The element is pulled according to the topic sent by the producer, not all metadata. Therefore, when sending a message, the topic will be stored in topics. This topic records the current existing topic. The data type is map, his key is topic, and the value defaults to – 1 at first.
Therefore, when topic1 is sent, a key value pair of topic1 – > – 1 will be added to topics.

Kafka -- metadata pull process

Version number and update ID

Since metadata pulling is performed by the sender thread (this process will be described later), an update ID needupdate is required to inform the sender thread that I need to pull metadata.

At the same time, after the sender thread successfully pulls metadata, it also needs to inform other threads that a version number version is required. Each successful pull is accumulated. Therefore, other threads will compare their version number with the version number in memory. If they find that the version number in memory is larger than themselves, that is, the metadata is pulled successfully.

When pulling, version = 0, the version saved by the current thread is also 0, and needupdate is true, indicating that metadata needs to be pulled.

Kafka -- metadata pull process

Sender thread

The sender thread has an endless loop. It will run all the time. The sending of metadata and messages should be through the Ender thread.

When the version number and update ID are ready, the sender thread starts to wake up. In fact, the sender thread will eventually wake up the thread on NiO’s selector.

This involves the network transmission part. Let’s talk about it briefly. We’ll talk about network transmission later. In short, the sender thread will give the request to the NiO selector, and then process the request received by the selector.

Kafka -- metadata pull process

After the sender thread processes the message, it will update the information in the metadata, then the version number is accumulated, and needupdate is changed to false.

In the above topics, the value corresponding to topic1 is 1. At this time, the metadata has been pulled, and the value here should be changed to the current time + 5 60 1000, i.e. current time + 5 minutes. This means that if a topic does not send a message for more than 5 minutes, it will be removed from the topics. When it is updated next time, the metadata information of the topic will not be pulled.

Kafka -- metadata pull process


What is the message sending thread doing when the sender thread is struggling to pull metadata?

He enters sleep, that is, wait. There are two situations when he is awakened. One is to wake him up after the sender thread updates the metadata. The other is that when the sleep time is up, he will have a maximum waiting time, which is 60s by default.

After being awakened, it will check whether the current pull time has exceeded. If the metadata has not been pulled for more than 60s, an exception will be thrown at this time.

In addition, judge whether the currently saved version number is less than the memory version number. If it is less than, he will know that the update is successful. On the contrary, it means that the update has not been successful, he will continue to sleep and wait for the next wake-up.

Kafka -- metadata pull process


After the sender pulls the metadata, it is saved in memory. In this way, the next time the message is sent, it will be taken directly from memory. It will not be the same as the above process every time. Pulling the metadata from Kafka again and again not only reduces the efficiency of the producer sending the message, but also increases the pressure on Kafka.

If a topic sends at least one message within 5 minutes, the topic will remain in topics. The producer client updates the metadata every 5 minutes, so the metadata in the cache will always be updated for messages sent continuously.