Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

Time:2019-7-31

background

Two-level Coordination Scheduling Architecture Based on Kafka Message Queue

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

In order to coordinate the work of internal consumer and Kafka connector, a replication protocol is implemented in Kafka. The main work is divided into two steps:

  1. Get metadata information such as topic offset by worker (consumer or connect) and give it to kafka‘s broker to complete Leader/Follower election
  2. The worker Leader node obtains the information of part and member stored in Kafka for secondary allocation, and realizes load balancing allocation combined with specific business.

From the function to achieve the upper two-level scheduling, the first-level scheduling is responsible for the election of the Leader, and the second-level scheduling is the assignment of tasks for each member of the worker node.

The main thing is to learn this architectural design idea, although this scenario is very limited.

Distributed Coordination Design Based on Message Queue

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

First-level coordinator design: First-level coordinator mainly refers to the Coordinator section, which records members’metadata information to conduct Leader election, such as deciding who is Leader according to offset size.
Secondary Coordinator Design: Secondary Coordinator mainly refers to the task allocation part of the Reader. The worker node can obtain all the tasks and node information, then assign tasks according to the appropriate algorithm, and finally broadcast to the message queue.

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

What we need to learn is that in the scenario of kafka, it’s troublesome to achieve unified scheduling for different services, so for example, to migrate the assignment of specific tasks from the architecture, to take charge of the Leader election at the broker end only at the general level, and to assign the assignment of specific services from the broker end to the broker end. The main business architecture is separated and implemented by specific business.

code implementation

Core Design

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

According to the design, we abstract: MemoryQueue, Worker, Coordinator, GroupRequest, GroupResponse, Task, Assignment collection core components

MemoryQueue: Simulate message queues to distribute messages and act as Kafka broker
Worker: Two-level Coordination Algorithms for Task Execution and Specific Business
Coordinator: A coordinator located within a message queue for Leader/Follower elections
Task: Task
Assignment: Coordnator’s task assignment results based on task information and node information
GroupRequest: Join the cluster request
GroupResponse: Response information

MemoryQueue

Core data structure

// MemoryQueue Memory Message Queue
type MemoryQueue struct {
    done             chan struct{}
    queue            chan interface{}
    wg               sync.WaitGroup
    coordinator      map[string]*Coordinator
    worker           map[string]*Worker
}

The coordinator is used to identify the coordinator for each group and to create an allocator for each group.

Node Join Cluster Request Processing

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

MemoryQueue receives the event type and distributes it according to the event type. If it is a GroupRequest event, it distributes it to handleGroupRequest for processing.
Within handleGroupRequest, the coordinator of the corresponding group is obtained first, and then the message queue is sent back based on the current information buildGroupResponse.

Event Distribution Processing

func (mq *MemoryQueue) handleEvent(event interface{}) {
    switch event.(type) {
    case GroupRequest:
        request := event.(GroupRequest)
        mq.handleGroupRequest(&request)
    case Task:
        task := event.(Task)
        mq.handleTask(&task)
    default:
        mq.Notify(event)
    }
    mq.wg.Done()
}

Join Group Group Request Processing

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

Coordnator calls its getLeaderID method to elect a Leader node based on the information of the members of the current group

// getGroup Coordinator Gets Coordinator for a specified group
func (mq *MemoryQueue) getGroupCoordinator(group string) *Coordinator {
    coordinator, ok := mq.coordinator[group]
    if ok {
        return coordinator
    }
    coordinator = NewCoordinator(group)
    mq.coordinator[group] = coordinator
    return coordinator
}

func (mq *MemoryQueue) handleGroupRequest(request *GroupRequest) {
    coordinator := mq.getGroupCoordinator(request.Group)
    exist := coordinator.addMember(request.ID, &request.Metadata)
    // If the worker has joined the group before, no action will be taken.
    if exist {
        return
    }
    // Reconstruct request information
    groupResponse := mq.buildGroupResponse(coordinator)
    mq.send(groupResponse)
}

func (mq *MemoryQueue) buildGroupResponse(coordinator *Coordinator) GroupResponse {
    return GroupResponse{
        Tasks:       coordinator.Tasks,
        Group:       coordinator.Group,
        Members:     coordinator.AllMembers(),
        LeaderID:    coordinator.getLeaderID(),
        Generation:  coordinator.Generation,
        Coordinator: coordinator,
    }
}

Coordinator

Core data structure

// Coordinator Coordinator
type Coordinator struct {
    Group      string
    Generation int
    Members    map[string]*Metadata
    Tasks      []string
    Heartbeats map[string]int64
}

Coordinator stores metadata information of each worker node through Members information, then Tasks stores all tasks of the current group, Heartbeats stores worker’s forehead and heartbeat information, Generation is a generational counter, and each node changes incrementally.

Election of Leader through offset

Election of primary node by storing metadata information of worker

// GettLeaderID Gets the leader node based on the current information
func (c *Coordinator) getLeaderID() string {
    leaderID, maxOffset := "", 0
    // This is judged by the size of offset, which is leader, and may actually be more complex.
    for wid, metadata := range c.Members {
        if leaderID == "" || metadata.offset() > maxOffset {
            leaderID = wid
            maxOffset = metadata.offset()
        }
    }
    return leaderID
}

Worker

Core data structure

// Worker Worker
type Worker struct {
    ID          string
    Group       string
    Tasks       string
    done        chan struct{}
    queue       *MemoryQueue
    Coordinator *Coordinator
}

The worker node contains a coordinator information for subsequent heartbeat transmission to that node

Distribution of request messages

The worker receives different event types and processes them according to the type. The handleGroup Response receives the information of the Coordinator response from the server, which contains the leader node and task information. The worker performs the secondary assignment, and the handleAssign processes the task information after the assignment.

// Execute receives assigned tasks for request execution
func (w *Worker) Execute(event interface{}) {
    switch event.(type) {
    case GroupResponse:
        response := event.(GroupResponse)
        w.handleGroupResponse(&response)
    case Assignment:
        assign := event.(Assignment)
        w.handleAssign(&assign)
    }
}

GroupResponse performs follow-up business logic based on role types

GroupResponse divides nodes into two types: Leader and Follower, which need to continue assigning tasks after receiving GroupResponse, while Follower only needs to listen for events and send heartbeat.

func (w *Worker) handleGroupResponse(response *GroupResponse) {
    if w.isLeader(response.LeaderID) {
        w.onLeaderJoin(response)
    } else {
        w.onFollowerJoin(response)
    }
}

Follower node

Follower node sends heartbeat

// The current role of onFollowerJoin is follower
func (w *Worker) onFollowerJoin(response *GroupResponse) {
    w.Coordinator = response.Coordinator
    go w.heartbeat()
}
// Heartbeat sends heartbeat
func (w *Worker) heartbeat() {
    // timer := time.NewTimer(time.Second)
    // for {
    //     select {
    //     case <-timer.C:
    //         w.Coordinator.heartbeat(w.ID, time.Now().Unix())
    //         timer.Reset(time.Second)
    //     case <-w.done:
    //         return
    //     }
    // }
}

Leader node

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

Leader node, where I divide scheduling allocation into two steps:
1) Tasks are partitioned by the number of nodes and tasks
2) Tasks after fragmentation are assigned to each node and sent back to the queue.

// The current role of onLeaderJoin is leader, which performs task assignment and sends MQ
func (w *Worker) onLeaderJoin(response *GroupResponse) {
    fmt.Printf("Generation [%d] leaderID [%s]\n", response.Generation, w.ID)
    w.Coordinator = response.Coordinator
    go w.heartbeat()
    // Task slicing
    taskSlice := w.performAssign(response)

    // Assigning tasks to individual workers
    memerTasks, index := make(map[string][]string), 0
    for _, name := range response.Members {
        memerTasks[name] = taskSlice[index]
        index++
    }

    // Distribution requests
    assign := Assignment{LeaderID: w.ID, Generation: response.Generation, result: memerTasks}
    w.queue.send(assign)
}

// PerfmAssign depends on the number of current members and tasks
func (w *Worker) performAssign(response *GroupResponse) [][]string {

    perWorker := len(response.Tasks) / len(response.Members)
    leftOver := len(response.Tasks) - len(response.Members)*perWorker

    result := make([][]string, len(response.Members))

    taskIndex, memberTaskCount := 0, 0
    for index := range result {
        if index < leftOver {
            memberTaskCount = perWorker + 1
        } else {
            memberTaskCount = perWorker
        }
        for i := 0; i < memberTaskCount; i++ {
            result[index] = append(result[index], response.Tasks[taskIndex])
            taskIndex++
        }
    }

test data

Start a queue, then join the task and worker, and observe the results of the assignment

// Building queues
    queue := NewMemoryQueue(10)
    queue.Start()

    // Sending Tasks
    queue.send(Task{Name: "test1", Group: "test"})
    queue.send(Task{Name: "test2", Group: "test"})
    queue.send(Task{Name: "test3", Group: "test"})
    queue.send(Task{Name: "test4", Group: "test"})
    queue.send(Task{Name: "test5", Group: "test"})

    // Start the worker and assign different offsets to each worker to see if the leader can be allocated properly
    workerOne := NewWorker("test-1", "test", queue)
    workerOne.start(1)
    queue.addWorker(workerOne.ID, workerOne)

    workerTwo := NewWorker("test-2", "test", queue)
    workerTwo.start(2)
    queue.addWorker(workerTwo.ID, workerTwo)

    workerThree := NewWorker("test-3", "test", queue)
    workerThree.start(3)
    queue.addWorker(workerThree.ID, workerThree)

    time.Sleep(time.Second)
    workerThree.stop()
    time.Sleep(time.Second)
    workerTwo.stop()
    time.Sleep(time.Second)
    workerOne.stop()

    queue.Stop()

Running results: First, according to offset, the final test-3-bit Leader, and then look at the task allocation results, there are two nodes, two tasks, one task, and then with the exit of the worker, the task will be reallocated.

Generation [1] leaderID [test-1]
Generation [2] leaderID [test-2]
Generation [3] leaderID [test-3]
Generation [1] worker [test-1]  run tasks: [test1||test2||test3||test4||test5]
Generation [1] worker [test-2]  run tasks: []
Generation [1] worker [test-3]  run tasks: []
Generation [2] worker [test-1]  run tasks: [test1||test2||test3]
Generation [2] worker [test-2]  run tasks: [test4||test5]
Generation [2] worker [test-3]  run tasks: []
Generation [3] worker [test-1]  run tasks: [test1||test2]
Generation [3] worker [test-2]  run tasks: [test3||test4]
Generation [3] worker [test-3]  run tasks: [test5]
Generation [4] leaderID [test-2]
Generation [4] worker [test-1]  run tasks: [test1||test2||test3]
Generation [4] worker [test-2]  run tasks: [test4||test5]
Generation [5] leaderID [test-1]
Generation [5] worker [test-1]  run tasks: [test1||test2||test3||test4||test5]

summary

In fact, in the distributed scenario, this kind of Leader/Follower election is more likely to choose consul, etcd, ZK and so on based on AP model. The design of this paper has a great relationship with kafka’s own business scenario. In the future, we should continue to look at other designs, which can be used for reference by Kafka connet. 了

To be continued
Focus on Public Number: Buyi Number Farmer

Kafka Two-Level Scheduling for Distributed Coordination of Microservice Task Assignment Golang Version

More exciting content can be found at www.sreguide.com