A memory error in golang channel

Time:2020-10-23
  • cause
  • Cause investigation
  • Cause analysis
  • Problem solving
  • summary

cause

In today’s database data reading, we first write the data read from the database to the channel through multiple goroutines, and at the same time read the data from the channel through another goroutine for analysis

This is such a simple function. When reading data, the following errors will occur from time to time:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f2227fe004d pc=0x52eb6f]

Cause investigation

The database is boltdb, and the location of the error is always injson.UnmarshalWhere:

1  for v := range outCh {
2    var data OmsData
3    if err := json.Unmarshal(v, &data); err != nil {
4      log.Fatalf("json unmarshal error: %v\n", err)
5    }
6  }

Outch is the data read from the database. At first, I thought that there was an error in the data. Later, it was found that err could not catch it. Every time, it was a panic error

Therefore, the following whole process is analyzed. The goroutine code for reading data is roughly as follows:

1  func readOneDB(db *bolt.DB, outCh chan []byte) {
 2    defer db.Close()
 3
 4 // get all buckets in dB
 5    bucketNames := getAllBucketNames(db)
 6
 7    err := db.View(func(tx *bolt.Tx) error {
 8
 9      for _, bName := range bucketNames {
10
11        bucket := tx.Bucket([]byte(bName))
12
13        bucket.ForEach(func(_ []byte, v []byte) error {
14 // write the value in the bucket to the channel
15          outCh

The code to read the data is also very simple, there is no obvious problem

Cause analysis

Reading and writing the channel code is as simple as the above. You can see why the panic occurs at a glance. I have carried out many experiments and found the following phenomena:

  1. Every time a panic happens,json.UnmarshalThe data received is different, that is, the panic does not occur on fixed data
  2. When a panic occurs, it is after the data is readreadOneDBAfter execution
  3. If the capacity of the channel is small, it is difficult to have a panic. If the capacity of the channel is large (for example, make (channel [] byte, 10000)), it is easy to have a panic
  4. Boltdb overall data volume (800000 pieces) is not small, if the data volume of the library is small, there will be no panic

Based on the above analysis, I thought it was notdb.Close()After that, some data written to the channel is also released

Problem solving

So, I tried to copy the data before writing to the channelreadOneDBAs follows:

1  func readOneDB(db *bolt.DB, outCh chan []byte) {
 2    defer db.Close()
 3
 4    bucketNames := getAllBucketNames(db)
 5
 6    err := db.View(func(tx *bolt.Tx) error {
 7
 8      for _, bName := range bucketNames {
 9
10        bucket := tx.Bucket([]byte(bName))
11
12        bucket.ForEach(func(_ []byte, v []byte) error {
13 // * * retrofit**
14 // the transformation method is to copy the data in the bucket into the channel
15 // instead of putting V into the channel as before
16          nb := make([]byte, len(v))
17          copy(nb, v)
18          outCh

After this transformation, there is no memory error again!

summary

When writing data to the channel of golang, if the reference type is written, the address of the data should be written instead of the complete data. If the data corresponding to the address is recycled by GC, memory errors will occur where the data is used

This problem is very hidden, because the timing of GC recycling can not be controlled. What we can do is to ensure that the data to be used will not be recycled at the code level