The example code of assignment and expansion in golang map

Time:2020-9-15

Golang map operation is a more complex logic in map implementation. Because when assigning a value, in order to reduce the length of the hash conflict chain is too long, the map will be expanded and the data will be migrated. The expansion of map and data migration are also the focus of attention.

data structure

First, we need to relearn the data structure of the map implementation


type hmap struct {
 count   int
 flags   uint8 
 B     uint8
 noverflow uint16
 hash0   uint32
 buckets  unsafe.Pointer
 oldbuckets unsafe.Pointer
 nevacuate uintptr
 extra *mapextra
}

type mapextra struct {
 overflow  *[]*bmap
 oldoverflow *[]*bmap
 nextOverflow *bmap
}

Hmap is the structure of map implementation. Most of the fields have been learned in the first section. The rest is nevacuate and extra.

First of all, we need to understand the concept of relocation: when the data link in the hash is too long or there are too many empty buckets, the data will be moved to a new bucket, and the existing bucket array will become oldbucket. The relocation of a bucket is not completed at one time. The relocation operation may be triggered only when the corresponding bucket is accessed. (is this similar to the expansion of redis, which focuses on multiple accesses to reduce the delay pressure of a single access.)

  • Novature identifies the location of the move (or consider the progress of the move). Identify where the bucket (an array) in oldbucket has been moved to.
  • Extra is the structure of a map. Nextoperflow identifies the empty bucket applied for later conflict resolution. Overflow and oldoverflow identify the bucket data in use in the overflow linked list. The difference between old and non old is that old is data for relocation.

After understanding the general data structure, we can learn the map assignment operation.

Map assignment operation

The map assignment operation is written as follows:


data := mapExample["hello"]

For the implementation of assignment, golang optimizes different types of K. here are some implementation methods:


func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {}
func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {}
func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer{}
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {}

We mainly study the implementation of mapassign.

The implementation of mapassign method is to find an empty bucket, assign the key to the bucket, and then return the address of Val, and then directly copy the memory through assembly.
Let’s look at how to find free buckets step by step

① Before the key is found, an exception detection is performed to verify whether the map is uninitialized or concurrent write operations are in progress. If so, an exception will be thrown: (this is the reason why map writes back to panic concurrently.)

if h == nil {
 panic(plainError("assignment to entry in nil map"))
}
//State checking and memory scanning

if h.flags&hashWriting != 0 {
 throw("concurrent map writes")
}

② The hash value corresponding to the key needs to be calculated. If the buckets are empty (maps smaller than a certain length will not initialize data during initialization), a bucket needs to be initialized

alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))

//Why do I need to set flags after hash because alg.hash Maybe panic
h.flags ^= hashWriting

if h.buckets == nil {
 h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}

③ Get the corresponding bucket by hash value. If the map is still migrating data, you still need to find the corresponding bucket in oldbucket and move to a new bucket.

//Calculate the bucket position offset by hash
bucket := hash & bucketMask(h.B)

//Here is the relocation logic, which we will explain in detail later
if h.growing() {
 growWork(t, h, bucket)
}

//Calculate the corresponding bucket position and top hash value
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)

④ After getting the bucket, you need to search one by one according to the linked list to find the corresponding key, which may be an existing key or need to be added.

for {
 for i := uintptr(0); i < bucketCnt; i++ {

  //If the tophash is not equal, the next tophash is taken
  if b.tophash[i] != top {

   //If it's a null position, take the pointer to kV.
   if isEmpty(b.tophash[i]) && inserti == nil {
    inserti = &b.tophash[i]
    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
    val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
   }

   //If there is no subsequent data, then there is no need to look for pits
   if b.tophash[i] == emptyRest {
    break bucketloop
   }
   continue
  }

  //If tophash matches

  k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
  if t.indirectkey() {
   k = *((*unsafe.Pointer)(k))
  }

  //We need to keep looking for K
  if !alg.equal(key, k) {
   continue
  }

  //If the key is equal, it means that there is data before. You can directly update K and get the address of V
  if t.needkeyupdate() {
   typedmemmove(t.key, k, key)
  }
  val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
  goto done
 }
 //Remove an overflow (linked list pointer)
 ovf := b.overflow(t)
 if ovf == nil {
  break
 }
 b = ovf
}

In summary, there are several parts in this procedure

a. If the map hash does not match, it will check whether it is empty kV. If delete is called, there will be an empty kV. Leave the address first. If the corresponding K is not found later (that is, there is no corresponding key in the map before), you can use the empty kV location directly.
b. If the map hash is matched, it is necessary to determine whether the literal values of the key match. If it doesn’t match, you also need to find it. If it matches, update the key directly (because there may be a reference), and the address of V can be returned.
c. If none of the above is available, look at the next bucket

⑤ Before inserting data, it will check that there is too much data, and the capacity needs to be expanded. If the capacity needs to be expanded, you can get the new bucket from ③ and find the corresponding location.


if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
 hashGrow(t, h)
 goto again // Growing the table invalidates everything, so try again
}

⑥ If there is no empty position just now, you need to add a bucket after the linked list to get kV.


if inserti == nil {
 // all current buckets are full, allocate a new one.
 newb := h.newoverflow(t, b)
 inserti = &newb.tophash[0]
 insertk = add(unsafe.Pointer(newb), dataOffset)
 val = add(insertk, bucketCnt*uintptr(t.keysize))
}

⑦ Finally, the literal values of tophash and key are updated, and the hashwriting constraint is removed

//If non pointer data (i.e. data assigned directly), memory and copy are also required
if t.indirectkey() {
 kmem := newobject(t.key)
 *(*unsafe.Pointer)(insertk) = kmem
 insertk = kmem
}
if t.indirectvalue() {
 vmem := newobject(t.elem)
 *(*unsafe.Pointer)(val) = vmem
}
//Update tophash, K
typedmemmove(t.key, insertk, key)
*inserti = top

done:
if h.flags&hashWriting == 0 {
  throw("concurrent map writes")
 }
 h.flags &^= hashWriting
 if t.indirectvalue() {
  val = *((*unsafe.Pointer)(val))
 }
 return val

At this point, the assignment of map is basically introduced. Next, learn how to expand the map in step 5.

Expansion of map

In two cases, expansion is needed. One is to store too much kV data, which has exceeded the current map load. There are also too many overflow buckets. This threshold value is a fixed value, the conclusion drawn by experience, so we do not elaborate here.

When the conditions are met, the expansion will start. If condition 2 is satisfied, the number of buckets after expansion is the same as that of the original, which indicates that empty kV may occupy too many pits. Therefore, the memory can be sorted out through map expansion. If the map load is too high due to the large amount of kV, it should be doubled.

func hashGrow(t *maptype, h *hmap) {
 bigger := uint8(1)
 //In the second case, the expansion size is 0
 if !overLoadFactor(h.count+1, h.B) {
  bigger = 0
  h.flags |= sameSizeGrow
 }
 oldbuckets := h.buckets

 //Apply for a large array as a new bucket
 newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

 flags := h.flags &^ (iterator | oldIterator)
 if h.flags&iterator != 0 {
  flags |= oldIterator
 }
 
 //The structure of the map is then reassigned, and oldboxes are filled. After that, the relocation operation will be carried out
 h.B += bigger
 h.flags = flags
 h.oldbuckets = oldbuckets
 h.buckets = newbuckets
 h.nevacuate = 0
 h.noverflow = 0

 //The extra structure is assigned a value
 if h.extra != nil && h.extra.overflow != nil {
  // Promote current overflow buckets to the old generation.
  if h.extra.oldoverflow != nil {
   throw("oldoverflow is not nil")
  }
  h.extra.oldoverflow = h.extra.overflow
  h.extra.overflow = nil
 }
 if nextOverflow != nil {
  if h.extra == nil {
   h.extra = new(mapextra)
  }
  h.extra.nextOverflow = nextOverflow
 }
}

Summarize the map expansion operation. First, get the size of the expansion, then apply for a large array, and then do some initialization operations to switch the old buckets and overflow.

Migration of map data

After the expansion is completed, data migration is required. The migration of data is not completed at one time. The corresponding bucket is migrated only when it is used. That is to say, data migration is done step by step. Let’s learn.

In step 3 of data assignment, you will see whether the bucket to be operated is in the old bucket. If it is, it will be relocated. The following is the specific operation of relocation:

func growWork(t *maptype, h *hmap, bucket uintptr) {
 //First move the bucket to be operated
 evacuate(t, h, bucket&h.oldbucketmask())
 
 //Move a bucket by the way
 if h.growing() {
  evacuate(t, h, h.nevacuate)
 }
}

Nevacuate indicates the current progress. If the relocation is completed, it should be the same as the length of 2 ^ B (here B is the B in old bucket, after all, the length of new buckets may be 2 ^ (B + 1)).

In the evaluate method, the bucket corresponding to this location and the data in its conflict chain are transferred to the new buckets.

① First, judge whether the current bucket has been transferred. (oldbucket identifies the location of the bucket to be relocated)

b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
//Judgment
if !evacuated(b) {
 //Do the transfer operation
}

The transfer can be judged directly by tophash, and the first hash value in tophash can be judged (see the third lecture for the role of tophash)

func evacuated(b *bmap) bool {
 h := b.tophash[0]
 //The flag of this interval has been transferred
 return h > emptyOne && h < minTopHash
}

② If it is not transferred, the data will be migrated. During data migration, the data may be migrated to buckets of the same size or to buckets twice as large. Here, XY is the mark that marks the target migration location: X identifies the migration to the same location, and Y identifies the migration to a location twice the size. Let’s first look at the location of the target:

var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.v = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() {
 //If it's twice the size, you have to calculate the value of Y once
 y := &xy[1]
 y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
 y.k = add(unsafe.Pointer(y.b), dataOffset)
 y.v = add(y.k, bucketCnt*uintptr(t.keysize))
}

③ After determining the location of the bucket, you need to migrate one by one according to kV. (the purpose is to clear the idle kV)

//Traverse each bucket
for ; b != nil; b = b.overflow(t) {
 k := add(unsafe.Pointer(b), dataOffset)
 v := add(k, bucketCnt*uintptr(t.keysize))

 //Traverse each kV in the bucket
 for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
  top := b.tophash[i]

  //If it is empty, it will not be migrated
  if isEmpty(top) {
   b.tophash[i] = evacuatedEmpty
   continue
  }
  if top < minTopHash {
   throw("bad map state")
  }
  k2 := k
  if t.indirectkey() {
   k2 = *((*unsafe.Pointer)(k2))
  }
  var useY uint8
  if !h.sameSizeGrow() {
   //The hash needs to be recalculated when the capacity is doubled,
   hash := t.key.alg.hash(k2, uintptr(h.hash0))
   if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.alg.equal(k2, k2) {
    useY = top & 1
    top = tophash(hash)
   } else {
    if hash&newbit != 0 {
     useY = 1
    }
   }
  }

  //These are fixed value checks and can be ignored
  if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
   throw("bad evacuatedN")
  }

  //Set the tophash of oldbucket as relocated
  b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
  dst := &xy[useY]         // evacuation destination
  if dst.i == bucketCnt {
   //If DST is the last kV in the bucket, you need to add an overflow
   dst.b = h.newoverflow(t, dst.b)
   dst.i = 0
   dst.k = add(unsafe.Pointer(dst.b), dataOffset)
   dst.v = add(dst.k, bucketCnt*uintptr(t.keysize))
  }
  //Fill in tophash value, kV data
  dst.b.tophash[dst.i&(bucketCnt-1)] = top
  if t.indirectkey() {
   *(*unsafe.Pointer)(dst.k) = k2
  } else {
   typedmemmove(t.key, dst.k, k)
  }
  if t.indirectvalue() {
   *(*unsafe.Pointer)(dst.v) = *(*unsafe.Pointer)(v)
  } else {
   typedmemmove(t.elem, dst.v, v)
  }

  //Update target bucket
  dst.i++
  dst.k = add(dst.k, uintptr(t.keysize))
  dst.v = add(dst.v, uintptr(t.valuesize))
 }
}

For the data not indirectly used by key (i.e. non pointer data), do memory recycling

if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
 b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
 ptr := add(b, dataOffset)
 n := uintptr(t.bucketsize) - dataOffset

 //PTR is the location of kV, and the TopMap in front is reserved for verification before migration
 memclrHasPointers(ptr, n)
}

④ If the location of the currently relocated bucket and the overall relocated bucket is the same, we need to update the overall progress flag nevacuate

//Newbit is the length of oldbuckets and the focus of nevacuate
func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
 //Update tags first
 h.nevacuate++

 //View 2 ^ 10 buckets at most
 stop := h.nevacuate + 1024
 if stop > newbit {
  stop = newbit
 }

 //If there is no relocation, stop it and wait for the next move
 for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
  h.nevacuate++
 }

 //If the relocation has been completed, the old bukes are completely relocated successfully, and the old baskets are cleared
 if h.nevacuate == newbit {
  h.oldbuckets = nil
  if h.extra != nil {
   h.extra.oldoverflow = nil
  }
  h.flags &^= sameSizeGrow
 }
}

summary

  1. The difficulty of map assignment lies in data expansion and data relocation.
  2. Bucket relocation is carried out step by step. Every time an assignment is made, at least one relocation will be done.
  3. The expansion does not necessarily add new space, but it may be just a memory consolidation.
  4. The flag of tophash can judge whether it is empty, whether to move or not, and the location of relocation is x or y.
  5. There may be many empty kV in the key in the delete map, which will lead to the relocation operation. If it can be avoided, try to avoid it.

This article on the implementation of the golang map assignment and expansion of the sample code to introduce this, more related to the content of the assignment and expansion of the golang map, please search the previous articles of developeppaer or continue to browse the related articles below, I hope you can support developeppaer more in the future!

Recommended Today

700 million requests per second, how can Alibaba’s new generation database support?

Reading guide of Ali Mei:Lindorm is an important part of big data storage and processing in cloud operating system Feitian. Lindorm is a distributed NoSQL database developed based on HBase and oriented to the field of big data. It integrates large-scale, high-throughput, fast, flexible and real-time hybrid capabilities. It provides the world’s leading hybrid storage […]