Creating topic


Kafka is an excellent distributed message queue, which is famous for its high throughput and excellent performance.

1、 Explanation of related terms

Broker: a physical entity that stores messages. It is stateless and can be deployed in multiple extensions
Topic: a set of messages
Partition: a message partition of a class of messages. A topic has multiple partitions, which can be deployed on multiple brokers
Replicas: backup, partition level, single partition can have multiple replicas to ensure high availability
Controller: Main broker, responsible for the leader election of partition
Zookeeper: stores information about brokers, controllers, consumer groups, etc., and provides distributed locks, etc
ISR: in sync replica, synchronous replica list, replica that can keep fetch leader message, the list includes leader and follow
HW: high watermark, the latest news location that consumers can consume
Leo: log end offset, the position of the last message in the log of each partition.

2、 Create topic process

Kafka’s topic creation logic is divided into two parts, the command line part and the controller’s background logic part. The command line is the main function directly called by the script, which is responsible for verifying the command and allocating the replica, and then writing the allocation result to the zookeeper node, and the controller monitors the changes of the zookeeper node to complete the background logic part of topic creation. It’s better to disable auto create and create topics uniformly.

Process: the script command actually runs the main function of topiccommand. After calling topiccommand to identify the command, it determines that it is the create command, and then it calls the createtopic method. Then judge whether there is a manually specified replica, that is, the — replica assignment parameter. If not, the system will allocate it automatically. The algorithm is as follows:

a. All brokers (assuming n brokers in total) and the partition to be allocated are sorted, and a broker is randomly selected as the node to be allocated
b. Suppose that the first broker to be allocated is B1, partition (P11) is allocated to B1, p12 to B2, and so on
c. After the first replica (P1) is allocated, the second replica (P2) is allocated. A shift (step size) is randomly selected as the interval between p2j and p1j. Suppose the random shift is s, p21 is assigned to B (1 + s), P22 is assigned to B (2 + s)

The following is a comment on the allocation algorithm in Kafka
There are 3 goals of replica assignment:

  1. Spread the replicas evenly among brokers.
  2. For partitions assigned to a particular broker, their other replicas are spread over the other brokers.
  3. If all brokers have rack information, assign the replicas for each partition to different racks if possible

To achieve this goal for replica assignment without considering racks, we:

  1. Assign the first replica of each partition by round-robin, starting from a random position in the broker list.
  2. Assign the remaining replicas of each partition with an increasing shift.

Here is an example of assigning
broker-0 broker-1 broker-2 broker-3 broker-4
p0 p1 p2 p3 p4 (1st replica)
p5 p6 p7 p8 p9 (1st replica)
p4 p0 p1 p2 p3 (2nd replica)
p8 p9 p5 p6 p7 (2nd replica)
p3 p4 p0 p1 p2 (3nd replica)
p7 p8 p9 p5 p6 (3nd replica)

In short, the goal of allocation is to ensure that multiple replicates of the same partition cannot be distributed on the same broker and evenly among brokers as far as possible.

If there is a - replica assignment parameter, the parameter values follow the following format: id0: Id1: Id2, ID3: ID4: id5, id6: id7: ID8. It means that the topic has three partitions (separated by ","), and each partition has three replicates (separated by ":".

The specific process of creating topic without specified parameters is described in detail as follows:
In the createTopic method of TopicCommand, first get the current brokers list from zookeeper, then call the assignReplicasToBrokers method to get the allocation relation between partition and broker, and finally write the relation to ZK node.

The topicchangelistener in Kafka’s controller will listen to the changes of sub nodes in the / brokers / topics directory. Once the number of child nodes in the directory changes, the corresponding listener processing method will be called. Firstly, update the cache information of controller, then create the corresponding partition machine copy object, and determine the leader copy and ISR for each partition.

For the controller, there are two state machines, partition state machine and replica state machine. The state changes of the two state machines are shown in Figure 1 and Figure 2.

Creating topic
Figure 1 partition state flow

Creating topic
Figure 2 copy status flow

After the partition state machine monitors the change of ZK node, it mainly carries out the following steps.

1. Get the new topic list under ZK, compare with the current cached list, determine the new topic, and update the controller cached topic list
2. From the / brokers / topics / ${topic} node, take out the replica allocation scheme of all the partitions of topic, and then update part of the information corresponding to the controller
3. Call onnewtopiccreation to create topic, register partition modification listener, and then call onnewpartitioncreation to create partition
4. Create partition object and set it to newpartition state;
5. Create the corresponding replica object for each partition, find out the partition allocation scheme from the controller cache, and then set all the replicas in the partition to the state of newreplica
6. To change the partition state to onlinepartition, first select the first replica in the replica set as the leader replica, and take the whole replica set as the ISR. At the same time, update the information to the broker / topics / test / partitions / 0 / state node of ZK, and then update the leader cache of controller. Finally, all the information (update metadata request) is sent to other brokers in the cluster.
7. Finally, set the replica object status to onlinereplica.

The following is a detailed flow chart:

Creating topic

High availability and high scalability of broker…

Message production and message storage…