Go serves miraculous panic


This pit is relatively fresh. It was just filled on Monday, and it was still under cold air.

– 1 –

In byte skipping, all logs of our online service are collected to streaming log service and ES cluster through a unified log library. With the monitoring ability of “sang” level (Xin) strong (Bing) big (Kuang), each panic log can trigger a call to the duty class.

So we often don’t choose the phone, just the phoneanonymous letter ↓↓↓

Go serves miraculous panic

But after all, it’s panic. Most of the cases will be brought to justice quickly, except for a few cases that are hard to investigate and don’t have too much impact on the Internet, such as this one:

Error: invalid memory address or nil pointer dereference
goroutine 68532877 [running]:
src/encoding/json/encode.go:880 +0x59
encoding/json.stringEncoder(0xcb9fead550, ...)
src/encoding/json/encode.go:298 +0xa5
encoding/json.Marshal(0x1ecb9a0, ...)

Note: for convenience of reading, it is slightly simplified.

You see, it can be trapped by recover (it won’t hang up the service), and its frequency is very low (several times a day or even not). Considering the proportion of tens of billions of requests per day, the ROI to solve it is too low, so it is delayed for a period of timeAnd don’t worry about the pot

Go serves miraculous panic

– 2 –

In fact, students s and I have paid attention to this panic before. From the error log above, we can see that the error occurred when calling JSON. Marshal. The code of the caller looks like this:

func SendData(...) {
  data := map[string]interface{} {
    "code":    ctx.ErrorCode,
    "message": ctx.Message,
    "step":    ctx.StepName,
  msg, err := json.Marshal(data)

Note: the actual map has more key / value, which is simplified here.

Looking at the code, the first reaction is: can this * * be panic?

Go serves miraculous panic

Find the corresponding JSON   Library source code (code.go, line 880, corresponding to line 5 below)

func (e *encodeState) string(s string, escapeHTML bool) {
  start := 0
  for i := 0; i < len(s); {
    if b := s[i]; b < utf8.RuneSelf {

——It’s just reading characters one by one from the string. It doesn’t look like cat pie.

Because panic happened in the official JSON   Library, not suitable for modification and deployment to full machine; The introduction of a third-party JSON library also involves a lot of dependency issues, so there was no follow-up at that time.

Until recently, the frequency of panic gradually increased, and students h and l couldn’t see it any more.

– 3 –

The idea of L student is, since the panic can be covered by recover, why don’t you look at what is in the map when panic is used?

Go serves miraculous panic

So the code goes like this:

defer func() {
  if p := recover(); p != nil {
    log.Warnf("Error: %v, data: %v", p, data)
data := map[string]...

And then panic moved on   log.Warnf   This line [Doge]

– 4 –

Anyway, the problem has been successfully transferred. Just comment out the line log. Warnf

Go serves miraculous panic

As a byte dancer pursuing perfection, l resisted the temptation and tried a new way of thinking. Since he saw from the panic log that he was kneeling on a string, at least let’s see which string it was

data := make(map[string]interface{})
defer func() {
  if p := recover(); p != nil {
    for k, v := range data {
      log.Warnf("CatchMe: k=%v", k)
      log.Warnf("CatchMe: v=%v", v)

It’s very easy to change. We should catch up with this  I need to workOn Sunday afternoon, I sent the car and caught a case in the evening.

Through the online log, we found that the error occurred in the  ” “Step” (there are output keys but no output values in the log), and the value should be the string CTX. Stepname.

But why does string, a seemingly harmless type of immune table, lead to panic?

Go serves miraculous panic

– 5 –

By reading the code, we know that when we encounter an exception, we will write the name of the exception to ctx.stepname, like this:

const STEP_XX = "XX"

func XX(...) {
  if err := process(); err != nil {
    ctx.StepName = STEP_XX

Reading while writing, it’s a little bitConcurrentThe taste of the food is gone.

In order to reduce the timeout rate of media perception, we package the whole advertising recall process as a time limited task

finished := make(chan struct{})
timer := time.NewTimer(duration)
go recall(finished)
select {
  case <-finished:
  case <- timer.C:

Therefore, in a request process, concurrent reading and writing of the string object CTX. Stepname may occur.

But how to dig the hole here?

– 6 –

It’s not easy to verify this directly in the online service, but student h made a simple POC, which is like this:

const (

func main() {
  var s string
  go func() {
    i := 1
    for {
      i = 1 - i
      if i == 0 {
        s = FIRST
      } else {
        s = SECOND

  for {

As soon as the code runs, it’s a bit of a smell:

$ go run poc.go

Go serves miraculous panic

Although I did not see panic, I saw something strange (make complaints about GC: GO).

And then go’s race detector

$ go run -race poc.go >/dev/null    
Write at 0x00c00011c1e0 by goroutine 7:
    POC. Go: 19 + 0x66 (assignment line)

Previous read at 0x00c00011c1e0 by main goroutine:
    POC. Go: 28 + 0x9d (println line)

This is a real hammer.

– 7 –

So why does concurrent reading and writing of string appear?

This starts with the underlying data structure of string. There is a type stringheader in the reflection package of go, which corresponds to the representation of string in go runtime

type StringHeader struct {
    Data uintptr
    Len  int

As you can see, a string consists of a pointer to the actual content of the string and a length.

For example, we can play with stringheader like this:

s := "hello"
p := *(*reflect.StringHeader)(unsafe.Pointer(&s))

For such a   struct ,Golang can’t guarantee the atomicity of assignmentTherefore, goroutine 1 may read the string just after modifying the pointer (data) and before modifying the length (len).

So we see the output of “what” – that is, when s is changed from “f * ck” to “what the”, data is changed and Len hasn’t had time to change (still equal to 4).

As for “f * ckgogc”, the opposite is true, and it is obvious that there is a cross-border access, but the cross-border access address is still in the process accessible address space.

– 8 –

Now that the problem is located, it is very easy to solve it.

The most direct way is to use sync. Mutex:

func (ctx *Context) SetStep(step string) {
  defer ctx.Mutex.Unlock()
  ctx.StepName = Step

butMutex performance is not good enough(lock does not scale with the number of the processors)The scenario of low probability of read write conflictThe better solution is to change the CTX. Stepname type to   Atomic. Value, and then


Note: you can also change it to * string and use atomic.storepointer

actually,Golang does not guarantee that any individual operation is atomic unless it uses the primitives or locks provided in the atomic package

– 9 –

Finale: on Monday afternoon, student h submitted the repair code and released it, and this panic never appeared again.

To sum up:

  • String is not as harmless as it seems
  • Concurrent pits can be found  – race   help
  • Remember to use mutex or atomic

Finally, I leave a small question for consideration

It’s been said for a long time, but it doesn’t reappear completely   Panic, but there are enough tools in this article. Can you think of what to do?

Recommended reading:

  • Programmer interview guide: interviewer’s perspective
  • Go service memory soars
  • TCP: the more you learn, the less you understand
  • UTF-8: some seemingly useless cold knowledge
  • Memory knowledge that C programmers should know (1)

Welcome to pay attention

Go serves miraculous panic