Analysis and implementation of main selection scenario in distributed system

Time:2021-9-10

1: Scenes that need to be selected
1: The service has multiple machines, one of which is used to perform the task. If multiple machines execute at the same time, there will be problems. For example, take out the failed records in the database and execute them again. If multiple machines execute at the same time, a failed task will be executed by multiple machines at the same time.
2: There are multiple machines in the service. Choose one of them as the master. The master is responsible for the distribution of tasks. Everyone consumes and processes tasks together. Or take out the failed records in the database and re execute them. Because one machine may not be able to process them, multiple machines need to process them together. At this time, the master machine is responsible for finding out the failed records from the database and writing them to the message queue. Other machines consume the tasks in the queue together and process the failed records
 
2: Select master
According to the master selection scenario above, we can actually select one from multiple machines at random, which is much simpler than the master selection algorithm of raft. We can even specify a machine in the configuration file. Only this machine can perform relevant functions, while other machines do not. If there are several fixed machines, and one machine can meet our needs, it’s OK. If the machine is not fixed and a single machine cannot handle it, the configuration file method is not suitable.
Competition can be used to select the master. Whoever wins first is the master.
 
1: Scheme I
Redis scheme is adopted. If the specified key does not exist, write the machine information to the key. The machine that is successfully written is the master. Set the expiration time to prevent the machine from hanging up abnormally. All machines need to grab the redis lock regularly. The setnx command meets our needs. The master is successful in writing redis, and the slave is unsuccessful in writing redis.
advantage:
  • 1: The implementation is simple, a little better than the configuration file, and supports machine dynamics
Disadvantages:
  • 1: You need to grab the lock regularly
  • 2: The master may change frequently, and it is necessary to ensure the correctness of business logic in the process of master switching
  • 3: Some time slices may not have a master, that is, the master has hung up, while other machines have not reached the time to grab the lock, this time slice has no master
 
2: Scheme II
Etcd scheme is adopted. Etcd supports transactions that can be written when they do not exist, so as to achieve redis   Setnx has the same effect, and the lease mechanism of etcd ensures that all machines will be notified when the master hangs up. At this time, everyone will automatically start a new round of master selection. In that sentence, the first one to grab is the master.
advantage:
  • Meet our needs without design defects
  • Only when the master hangs up can the master be re selected. There is no need to worry about the impact of the master on the business logic in the process of switching
Disadvantages:
  • The implementation is relatively complex, so I’ll try it

The implementation of golang source code is as follows:

  1 package etcdDemo
  2 
  3 import (
  4     "context"
  5     "fmt"
  6     "github.com/coreos/etcd/clientv3"
  7     "github.com/google/uuid"
  8     "time"
  9 )
 10 
 11 type Callback func(isMaster bool)
 12 
 13 type SelectMaster struct {
 14     endPoints []string
 15     key       string
 16     cli       *clientv3.Client
 17     lease     *clientv3.LeaseGrantResponse
 18     chClose   chan int
 19     callback  Callback
 20     token     string
 21     isMaster  bool
 22 }
 23 
 24 func NewSelectMaster(endPoints []string, key string) (*SelectMaster, error) {
 25     sm := &SelectMaster{
 26         endPoints: endPoints,
 27         key:       key,
 28         chClose:   make(chan int, 0),
 29         token:     uuid.New().String(),
 30     }
 31 
 32     cli, err := clientv3.New(clientv3.Config{
 33         Endpoints:   endPoints,
 34         DialTimeout: 3 * time.Second,
 35     })
 36     if err != nil {
 37         return sm, err
 38     }
 39     sm.cli = cli
 40     go sm.ioLoop()
 41     return sm, nil
 42 }
 43 
 44 func (sm *SelectMaster) ioLoop() {
 45     fmt.Println("SelectMaster.ioLoop start")
 46     ticker := time.NewTicker(time.Second * 3)
 47     defer ticker.Stop()
 48     chWatch := sm.cli.Watch(context.TODO(), sm.key)
 49     for {
 50         select {
 51         case