Do you know the implementation scheme of set in go?

Time:2021-10-26

GoOur design is a simple philosophy,Do you know the implementation scheme of set in go?

It discards some bloated functions and modules of other languages to reduce the learning threshold of programmers and reduce the mental burden in use.

In this paper, we discuss the missing data structure in go:Set, and its best implementation.

Set semantics and implementation scheme

Set is a common data structure in other languages. Properties: the objects in the collection are not sorted in a specific way, and there are no duplicate objects.

Do you know the implementation scheme of set in go?

When learning go, remember: what go doesn’t contain doesn’t mean go really doesn’t. According to the set feature, we can easily think of an implementation scheme using map (because the key of map is not repeated): store the object as a key in map.

Using map to implement set means that we only care about the existence of key, and its value is not important. People with programming experience in other languages may choose bool as value because it is the type with the least memory consumption (1 byte) in other languages. But in go, there is another option: struct {}.

  1. fmt.Println(unsafe.Sizeof(struct {}{})) // output: 0 

Pressure measurement comparison

To explore which data structure is the best choice as value. We selected the following common types as values to test: bool, int, interface {}, struct {}.

  1. package main 
  2.  
  3. import ( 
  4.  “testing” 
  5.  
  6. const num = int(1 << 24) 
  7.  
  8. //   test   bool   type
  9. func Benchmark_SetWithBoolValueWrite(b *testing.B) { 
  10.  set := make(map[int]bool) 
  11.  for i := 0; i < num; i++ { 
  12.   set[i] = true 
  13.  } 
  14.  
  15. //   test   interface{}   type
  16. func Benchmark_SetWithInterfaceValueWrite(b *testing.B) { 
  17.  set := make(map[int]interface{}) 
  18.  for i := 0; i < num; i++ { 
  19.   set[i] = struct{}{} 
  20.  } 
  21.  
  22. //   test   int   type
  23. func Benchmark_SetWithIntValueWrite(b *testing.B) { 
  24.  set := make(map[int]int) 
  25.  for i := 0; i < num; i++ { 
  26.   set[i] = 0 
  27.  } 
  28.  
  29. //   test   struct{}   type
  30. func Benchmark_SetWithStructValueWrite(b *testing.B) { 
  31.  set := make(map[int]struct{}) 
  32.  for i := 0; i < num; i++ { 
  33.   set[i] = struct{}{} 
  34.  } 

We run the following command to test

  1. $ go test -v -bench=. -count=3 -benchmem | tee result.txt 
  2. goos: darwin 
  3. goarch: amd64 
  4. pkg: workspace/example/demoForSet 
  5. cpu: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz 
  6. Benchmark_SetWithBoolValueWrite 
  7. Benchmark_SetWithBoolValueWrite-8                1 3549312568 ns/op 883610264 B/op   614311 allocs/op 
  8. Benchmark_SetWithBoolValueWrite-8                1 3288521519 ns/op 883599440 B/op   614206 allocs/op 
  9. Benchmark_SetWithBoolValueWrite-8                1 3264097496 ns/op 883578624 B/op   614003 allocs/op 
  10. Benchmark_SetWithInterfaceValueWrite 
  11. Benchmark_SetWithInterfaceValueWrite-8           1 4397757645 ns/op 1981619632 B/op   614062 allocs/op 
  12. Benchmark_SetWithInterfaceValueWrite-8           1 4088301215 ns/op 1981553392 B/op   613743 allocs/op 
  13. Benchmark_SetWithInterfaceValueWrite-8           1 3990698218 ns/op 1981560880 B/op   613773 allocs/op 
  14. Benchmark_SetWithIntValueWrite 
  15. Benchmark_SetWithIntValueWrite-8                 1 3472910194 ns/op 1412326480 B/op   615131 allocs/op 
  16. Benchmark_SetWithIntValueWrite-8                 1 3519755137 ns/op 1412187928 B/op   614294 allocs/op 
  17. Benchmark_SetWithIntValueWrite-8                 1 3459182691 ns/op 1412057672 B/op   613390 allocs/op 
  18. Benchmark_SetWithStructValueWrite 
  19. Benchmark_SetWithStructValueWrite-8              1 3126746088 ns/op 802452368 B/op   614127 allocs/op 
  20. Benchmark_SetWithStructValueWrite-8              1 3161650835 ns/op 802431240 B/op   613632 allocs/op 
  21. Benchmark_SetWithStructValueWrite-8              1 3160410871 ns/op 802440552 B/op   613748 allocs/op 
  22. PASS 
  23. ok   workspace/example/demoForSet 42.660s 

The results at this time do not seem intuitive. Here is a recommended benchmark statistical tool: benchmark. Install with the following command

  1. $ go get -u golang.org/x/perf/cmd/benchstat 

Use benchmark to analyze the benchmark result file just obtained

  1. $ benchstat result.txt 
  2. name                           time/op 
  3. _SetWithBoolValueWrite-8        3.37s ± 5% 
  4. _SetWithInterfaceValueWrite-8   4.16s ± 6% 
  5. _SetWithIntValueWrite-8         3.48s ± 1% 
  6. _SetWithStructValueWrite-8      3.15s ± 1% 
  7.  
  8. name                           alloc/op 
  9. _SetWithBoolValueWrite-8        884MB ± 0% 
  10. _SetWithInterfaceValueWrite-8  1.98GB ± 0% 
  11. _SetWithIntValueWrite-8        1.41GB ± 0% 
  12. _SetWithStructValueWrite-8      802MB ± 0% 
  13.  
  14. name                           allocs/op 
  15. _SetWithBoolValueWrite-8         614k ± 0% 
  16. _SetWithInterfaceValueWrite-8    614k ± 0% 
  17. _SetWithIntValueWrite-8          614k ± 0% 
  18. _SetWithStructValueWrite-8       614k ± 0% 

In terms of memory overhead, struct {} is the smallest, and its execution time is also the least. Since the bool type occupies only one byte, it is not much different from the empty structure. However, if the interface {} type is used, the gap is obvious.

Therefore, there is no doubt that in the implementation of set, the map value type should be struct {}.

summary

Although this paper discusses the implementation scheme of set, its essence involves the zero memory characteristic of empty structure struct {}.

In addition to being the best solution to realize the value value of set, an empty structure can also be applied to the following aspects:

  • Channel of notification signal: when the channel is only used to notify goroutine’s execution events, the channel does not need to send any substantive data. Choose to use channel struct {}.
  • Structure without status data: when an object only has methods and does not contain any attribute fields, choose to use an empty structure to define the object.