Understanding Concurrent Bug in the Real World

Time:2019-4-2

Go brings new concurrency primitives and concurrency patterns (which are not very new), and without a thorough understanding of these features, concurrent bugs can be written as well.

In the paper Understanding Real-World Concurrency Bugs in Go, the author systematically analyzed six popular Go projects (Docker, Kubernetes, gRPC-go, etcd, CockroachDB, BoltD) and 171 concurrent bugs among them. Through these analyses, we can deepen our understanding of the concurrent model of Go and produce better and more reliable code.

Our study shows that it is as easy to make concurrency bugs with message passing as with shared memory,sometimes even more.
Our research shows that messaging is as easy to write concurrent errors as shared memory, and sometimes even easier to write concurrent errors.

For example, here is a bug in k8s.finishReqCreate a subcollaboration to executefnThen passselectWaiting for the completion or timeout of the sub-collaboration:

func finishReq(timeout time.Duration) r ob {
    ch :=make(chanob)
    // ch: = make (chanob, 1) // fix
    go func() {
        result := fn()
        Ch < - result // blocking
    }
    select {
        case
            result = <- ch
            return result
        case <- time.After(timeout)
            return nil
        }
    }
}

If a timeout occurs first, or both, but the go runtime chooses the timeout branch (uncertainties), the subprocesses will be blocked forever.

Use of Go concurrent mode

This section analyses the use of goroutine and concurrent primitives in six projects.

Understanding Concurrent Bug in the Real World

Anonymous functions use more goroutines than normal functions, basically creating a goroutine for every 1-5,000 lines of code.

Understanding Concurrent Bug in the Real World

Although Go encourages messaging, in these large projects, shared memory is used more than messaging, and Mutex is almost twice as much as channel.

Bug grad

In this paper, bugs are classified according to two dimensions:

  1. Behavior: Blocking and non-blocking. Blocking bugs refer to situations where a goroutine accidentally blocks execution that cannot continue (e.g. deadlocks). Non-blocking bugs are usually data conflicts.
  2. Cause: Shared memory and messaging, because of bugs caused by using one of these two technologies

Understanding Concurrent Bug in the Real World

As you can see, shared memory actually causes more bugs.

Blocking bug

Understanding Concurrent Bug in the Real World

There are almost as many blocking bugs caused by messaging and shared memory, and the blocking bugs of messaging are related to Gos messaging semantics such as channel. It is difficult to find bugs when messaging and shared memory are used together.

For example, Docker misuseWaitGroupCausing congestion:

var group sync.WaitGroup
group.Add(len(pm.plugins))
for_, p := range pm.plugins {
    go func(p *plugin) {
        defer group.Done()
    }
    Group. Wait ()// Blocking
}
// It should be here group. Wait ()

Misuse of channel and mutex results in blocking:

func goroutine1() {
    m.Lock()
    Ch < - Request // blocking
    m.Unlock()
}

func goroutine2() {
    for{
        M. Lock ()// Blocking
        m.Unlock()
        request <- ch
    }
}

Non blocking bug

Understanding Concurrent Bug in the Real World

Shared memory causes more non-blocking bugs, almost eight times as many as messaging.

For example, in the following code, whenevertickerExecute once when triggeredf()ThroughstopChExit the cycle:

ticker := time.NewTicker()
for {
    f()
    select {
        case <- stopCh
            return
        case <- ticker
    }
}

But selection is uncertain.stopChandtickerWhen it happens at the same time, it may not necessarily be implemented.stopChanThe correct way is to check the branch first.stopCh

ticker := time.NewTicker()
for {
    select{
        case <- stopCh:
            return
        default:
    }
    f()
    select {
        case <- stopCh:
            return
        case <- ticker:
    }
}

Reference resources

  • System-pclub/go-concurrency-bugs: Paper dataset, which contains real bug codes and fixes for various projects, is a very good learning resource.
  • Understanding Real-World Concurrency Bugs in Go: Paper Ontology

Recommended Today

Redis design and implementation 4: Dictionary Dict

In redis, the dictionary is the infrastructure. Redis database data, expiration time and hash type all take the dictionary as the underlying structure. Structure of dictionary Hashtable The implementation code of hash table is as follows:dict.h/dictht The dictionary of redis is implemented in the form of hash table. typedef struct dictht { //Hash table array, […]