Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

Time:2020-2-13

background

Two level coordinated scheduling architecture based on Kafka message queue

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

In order to coordinate the work of internal consumer and Kafka connector, Kafka implements a replication protocol. The main work is divided into two steps:

  1. Through the worker (consumer or connect) to obtain their own topic offset and other metadata information, and give it to Kafka’s broker to complete the leader / follower election
  2. The worker leader node obtains the party and member information stored in Kafka to carry out secondary distribution, and realize load balancing distribution combined with specific services

Two levels of scheduling are realized from the function, the first level is responsible for selecting the leader, and the second level is that the worker node completes the task allocation of each member

The main purpose is to learn this architecture design idea, although the scenario is very limited

Distributed coordination design based on message queue

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

First level coordinator design: the first level coordinator mainly refers to the coordinator part, which records the metadata information of members to elect leaders, such as deciding who is the leader according to the size of the offset
Second level coordinator design: the second level coordinator mainly refers to the leader task allocation part. When the worker node obtains all tasks and node information, it can allocate tasks according to the appropriate algorithm, and finally broadcast to the message queue

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

What we need to learn is that in the Kafka scenario, if we want to achieve unified scheduling for different businesses, it’s quite troublesome. For example, we need to migrate the assignment of specific tasks from the architecture. At the broker end, we only need to be responsible for the leader election of the general layer. We need to separate the assignment of specific businesses from the main business architecture and implement it by specific businesses present

code implementation

Core design

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

According to the design, we abstract out: memoryqueue, worker, coordinator, grouprequest, groupresponse, task, assignment collection core components

Memoryqueue: simulate message queue to realize message distribution and act as Kafka broker
Worker: task execution and business specific secondary coordination algorithm
Coordinator: a coordinator within the message queue for leader / follower elections
Task: task
Assignment: coordnator task assignment results based on task information and node information
Grouprequest: Join cluster request
Groupresponse: response information

MemoryQueue

Core data structure

//Memoryqueue memory message queue
type MemoryQueue struct {
    done             chan struct{}
    queue            chan interface{}
    wg               sync.WaitGroup
    coordinator      map[string]*Coordinator
    worker           map[string]*Worker
}

Where coordinator is used to identify the coordinator of each group and establish an allocator for each group

Node joining cluster request processing

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

Memoryqueue receives the event type and distributes it according to the event type. If it is a grouprequest event, it will be distributed to handlegrouprequest for processing
In handlegrouprequest, first obtain the coordinator of the corresponding group, and then build groupresponse to send back the message queue according to the current information

Event distribution processing

func (mq *MemoryQueue) handleEvent(event interface{}) {
    switch event.(type) {
    case GroupRequest:
        request := event.(GroupRequest)
        mq.handleGroupRequest(&request)
    case Task:
        task := event.(Task)
        mq.handleTask(&task)
    default:
        mq.Notify(event)
    }
    mq.wg.Done()
}

Join group request processing

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

Coordnator will call its own getleaderid method to select a leader node according to the information of each member in the current group

//Getgroupcoordinator get the coordinator of the specified group
func (mq *MemoryQueue) getGroupCoordinator(group string) *Coordinator {
    coordinator, ok := mq.coordinator[group]
    if ok {
        return coordinator
    }
    coordinator = NewCoordinator(group)
    mq.coordinator[group] = coordinator
    return coordinator
}

func (mq *MemoryQueue) handleGroupRequest(request *GroupRequest) {
    coordinator := mq.getGroupCoordinator(request.Group)
    exist := coordinator.addMember(request.ID, &request.Metadata)
    //If the worker has joined the group before, nothing will be done
    if exist {
        return
    }
    //Rebuild request information
    groupResponse := mq.buildGroupResponse(coordinator)
    mq.send(groupResponse)
}

func (mq *MemoryQueue) buildGroupResponse(coordinator *Coordinator) GroupResponse {
    return GroupResponse{
        Tasks:       coordinator.Tasks,
        Group:       coordinator.Group,
        Members:     coordinator.AllMembers(),
        LeaderID:    coordinator.getLeaderID(),
        Generation:  coordinator.Generation,
        Coordinator: coordinator,
    }
}

Coordinator

Core data structure

//Coordinator Coordinator
type Coordinator struct {
    Group      string
    Generation int
    Members    map[string]*Metadata
    Tasks      []string
    Heartbeats map[string]int64
}

In the coordinator, metadata information of each worker node is stored through members information, and then tasks stores all tasks of the current group. Heartbeat stores the heartbeat information of the worker. Generation is a generation counter, and each node change will increase

Select leader by offset

Through the metadata information of the stored worker, the primary node is elected

//Getleaderid get the leader node according to the current information
func (c *Coordinator) getLeaderID() string {
    leaderID, maxOffset := "", 0
    //This is determined by the offset size. The larger offset is the leader, which may be more complex in fact
    for wid, metadata := range c.Members {
        if leaderID == "" || metadata.offset() > maxOffset {
            leaderID = wid
            maxOffset = metadata.offset()
        }
    }
    return leaderID
}

Worker

Core data structure

//Worker worker
type Worker struct {
    ID          string
    Group       string
    Tasks       string
    done        chan struct{}
    queue       *MemoryQueue
    Coordinator *Coordinator
}

The worker node will contain a coordinator information, which is used to send heartbeat information to the node later

Distribute request message

The worker receives different event types and processes them according to the types. The handlegroupresponse is responsible for receiving the information of the coordinator response of the server, which contains the leader node and task information. The worker performs the secondary allocation, and handleassign is the task information after the allocation

//Execute receives the assigned task for request execution
func (w *Worker) Execute(event interface{}) {
    switch event.(type) {
    case GroupResponse:
        response := event.(GroupResponse)
        w.handleGroupResponse(&response)
    case Assignment:
        assign := event.(Assignment)
        w.handleAssign(&assign)
    }
}

Groupresponse performs subsequent business logic according to role type

The GroupResponse will divide the node into two types: Leader and Follower. After receiving the GroupResponse, the Leader node needs to continue to assign tasks, while the Follower only needs to listen for events and send heartbeat

func (w *Worker) handleGroupResponse(response *GroupResponse) {
    if w.isLeader(response.LeaderID) {
        w.onLeaderJoin(response)
    } else {
        w.onFollowerJoin(response)
    }
}

Follower node

Follow node sends heartbeat

//On follower join the current role is follower
func (w *Worker) onFollowerJoin(response *GroupResponse) {
    w.Coordinator = response.Coordinator
    go w.heartbeat()
}
//Heartbeat send heartbeat
func (w *Worker) heartbeat() {
    // timer := time.NewTimer(time.Second)
    // for {
    //     select {
    //     case <-timer.C:
    //         w.Coordinator.heartbeat(w.ID, time.Now().Unix())
    //         timer.Reset(time.Second)
    //     case <-w.done:
    //         return
    //     }
    // }
}

Leader node

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

In the leader node, I divide the scheduling allocation into two steps:
1) Divide tasks by number of nodes and tasks
2) The task after sharding is assigned to each node and finally sent back to the queue

//Onleaderjoin the current role is leader, perform task assignment and send MQ
func (w *Worker) onLeaderJoin(response *GroupResponse) {
    fmt.Printf("Generation [%d] leaderID [%s]\n", response.Generation, w.ID)
    w.Coordinator = response.Coordinator
    go w.heartbeat()
    //Split tasks
    taskSlice := w.performAssign(response)

    //Assign tasks to workers
    memerTasks, index := make(map[string][]string), 0
    for _, name := range response.Members {
        memerTasks[name] = taskSlice[index]
        index++
    }

    //Distribution request
    assign := Assignment{LeaderID: w.ID, Generation: response.Generation, result: memerTasks}
    w.queue.send(assign)
}

//Performacassign based on the current number of members and tasks
func (w *Worker) performAssign(response *GroupResponse) [][]string {

    perWorker := len(response.Tasks) / len(response.Members)
    leftOver := len(response.Tasks) - len(response.Members)*perWorker

    result := make([][]string, len(response.Members))

    taskIndex, memberTaskCount := 0, 0
    for index := range result {
        if index < leftOver {
            memberTaskCount = perWorker + 1
        } else {
            memberTaskCount = perWorker
        }
        for i := 0; i < memberTaskCount; i++ {
            result[index] = append(result[index], response.Tasks[taskIndex])
            taskIndex++
        }
    }

test data

Start a queue, add tasks and workers, and observe the allocation results

//Build queue
    queue := NewMemoryQueue(10)
    queue.Start()

    //Send task
    queue.send(Task{Name: "test1", Group: "test"})
    queue.send(Task{Name: "test2", Group: "test"})
    queue.send(Task{Name: "test3", Group: "test"})
    queue.send(Task{Name: "test4", Group: "test"})
    queue.send(Task{Name: "test5", Group: "test"})

    //Start the worker, assign different offsets to each worker, and observe whether the leader can be allocated normally
    workerOne := NewWorker("test-1", "test", queue)
    workerOne.start(1)
    queue.addWorker(workerOne.ID, workerOne)

    workerTwo := NewWorker("test-2", "test", queue)
    workerTwo.start(2)
    queue.addWorker(workerTwo.ID, workerTwo)

    workerThree := NewWorker("test-3", "test", queue)
    workerThree.start(3)
    queue.addWorker(workerThree.ID, workerThree)

    time.Sleep(time.Second)
    workerThree.stop()
    time.Sleep(time.Second)
    workerTwo.stop()
    time.Sleep(time.Second)
    workerOne.stop()

    queue.Stop()

Operation result: first, according to the offset, the final test-3-bit leader, and then view the task allocation result. There are two nodes, two tasks, one node and one task. Then with the exit of the worker, the task will be reassigned

Generation [1] leaderID [test-1]
Generation [2] leaderID [test-2]
Generation [3] leaderID [test-3]
Generation [1] worker [test-1]  run tasks: [test1||test2||test3||test4||test5]
Generation [1] worker [test-2]  run tasks: []
Generation [1] worker [test-3]  run tasks: []
Generation [2] worker [test-1]  run tasks: [test1||test2||test3]
Generation [2] worker [test-2]  run tasks: [test4||test5]
Generation [2] worker [test-3]  run tasks: []
Generation [3] worker [test-1]  run tasks: [test1||test2]
Generation [3] worker [test-2]  run tasks: [test3||test4]
Generation [3] worker [test-3]  run tasks: [test5]
Generation [4] leaderID [test-2]
Generation [4] worker [test-1]  run tasks: [test1||test2||test3]
Generation [4] worker [test-2]  run tasks: [test4||test5]
Generation [5] leaderID [test-1]
Generation [5] worker [test-1]  run tasks: [test1||test2||test3||test4||test5]

summary

In fact, in the distributed scenario, this leader / follower election is more likely to choose the AP model based consumer, etcd, ZK, etc. This design of this paper has a lot to do with Kafka’s own business scenario. If there is time in the future, I will continue to look at other designs, from Kafka connect’s reference design, that’s all

To be continued
Pay attention to public number: Buyi minong

Kafka two-level scheduling for distributed coordination of microservice task assignment in golang

More highlights can be found at www.sreguide.com