Proper memory alignment in go language

Time:2022-5-26

problem

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

Before we start, I want you to calculatePart1What is the total occupancy size?

func main() {
    fmt.Printf("bool size: %d\n", unsafe.Sizeof(bool(true)))
    fmt.Printf("int32 size: %d\n", unsafe.Sizeof(int32(0)))
    fmt.Printf("int8 size: %d\n", unsafe.Sizeof(int8(0)))
    fmt.Printf("int64 size: %d\n", unsafe.Sizeof(int64(0)))
    fmt.Printf("byte size: %d\n", unsafe.Sizeof(byte(0)))
    fmt.Printf("string size: %d\n", unsafe.Sizeof("EDDYCJY"))
}

Output result:

bool size: 1
int32 size: 4
int8 size: 1
int64 size: 8
byte size: 1
string size: 16

On this count,Part1The memory occupied by this structure is 1 + 4 + 1 + 8 + 1 = 15 bytes. I believe some little friends count like this, and there seems to be nothing wrong with them

What is the real situation? Let’s actually call it, as follows:

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

func main() {
    part1 := Part1{}
    
    fmt.Printf("part1 size: %d, align: %d\n", unsafe.Sizeof(part1), unsafe.Alignof(part1))
}

Output result:

part1 size: 32, align: 8

The final output takes 32 bytes. This is totally different from the expected result. This fully shows that the previous calculation method is wrong. Why?

Here we need to mention the concept of “memory alignment” in order to use the correct posture to calculate. Next, let’s talk about what it is in detail

memory alignment

Some small partners may think that memory reading is a simple byte array

Proper memory alignment in go language

The above figure shows the memory reading mode of one pit and one radish. But in fact, the CPU does not read and write to memory one byte by one. Instead, the CPU reads memoryRead one by one, the size of the block can be 2, 4, 6, 8, 16 bytes, etc. Block size we call itMemory access granularity。 As shown below:

Proper memory alignment in go language

In the example, assume that the access granularity is 4. The CPU reads and writes to memory at the access granularity of every 4 bytes. This is the right posture

Why care about alignment

  • The code you are writing has certain requirements in terms of performance (CPU, memory)
  • You are dealing with vector instructions
  • Some hardware platform (ARM) architectures do not support misaligned memory access

In addition, as an engineer, you also need to learn this knowledge:)

Why align

  • Platform (portability) reason: not all hardware platforms can access any data at any address. For example, a specific hardware platform only allows you to obtain specific types of data at a specific address, otherwise it will lead to exceptions
  • Performance reason: accessing misaligned memory will cause the CPU to access memory twice, and it will take additional clock cycles to process alignment and operation. The self aligned memory only needs one access to complete the reading action
Proper memory alignment in go language

In the above figure, if you start reading from index 1, there will be a crash. Because its memory access boundary is misaligned. Therefore, the CPU will do some additional processing work. As follows:

  1. CPU firstRead the first memory block of the misaligned address and read 0-3 bytes. And remove unnecessary byte 0
  2. CPU againRead the second memory block of the misaligned address and read 4-7 bytes. And remove unnecessary bytes 5, 6 and 7
  3. Merge 1-4 bytes of data
  4. Put into register after merging

From the above process, it can be concluded that not doing “memory alignment” is a bit “troublesome”. Because it will add many time-consuming actions

Assuming that memory alignment is done, four bytes are read from index 0, which only needs to be read once without additional operation. This is obviously much more efficient and standardSpace for timepractice

Default factor

Compilers on different platforms have their own default “alignment coefficient”, which can be obtained by precompiling commands#pragma pack(n)Change, n means “alignment factor”. Generally speaking, the coefficients of our commonly used platforms are as follows:

  • 32-bit: 4
  • 64 bit: 8

In addition, it should be noted that the size and alignment values occupied by different hardware platforms may be different. Therefore, the value in this paper is not unique, and it should be considered according to the actual situation of the machine during debugging

Member alignment

func main() {
    fmt.Printf("bool align: %d\n", unsafe.Alignof(bool(true)))
    fmt.Printf("int32 align: %d\n", unsafe.Alignof(int32(0)))
    fmt.Printf("int8 align: %d\n", unsafe.Alignof(int8(0)))
    fmt.Printf("int64 align: %d\n", unsafe.Alignof(int64(0)))
    fmt.Printf("byte align: %d\n", unsafe.Alignof(byte(0)))
    fmt.Printf("string align: %d\n", unsafe.Alignof("EDDYCJY"))
    fmt.Printf("map align: %d\n", unsafe.Alignof(map[string]string{}))
}

Output result:

bool align: 1
int32 align: 4
int8 align: 1
int64 align: 8
byte align: 1
string align: 8
map align: 8

Can be called in gounsafe.AlignofTo return the alignment factor of the corresponding type. By observing the output results, we can know that they are basically2^n, the maximum will not exceed 8. This is because the default alignment factor of my portable (64 bit) compiler is 8, so the maximum value will not exceed this number

Overall alignment

In the previous section, it was mentioned that the member variables in the structure should be byte aligned. Of course, as the structure of the final result, byte alignment is also required

Alignment rules

  • The member variable of the structure. The offset of the first member variable is 0. The alignment value of each subsequent member variable must beCompiler default alignment length#pragma pack(n))OrThe length of the current member variable typeunsafe.Sizeof), takeThe minimum value is used as the alignment value of the current type。 The offset must be an integral multiple of the alignment value
  • Structure itself, alignment value must beCompiler default alignment length#pragma pack(n))OrThe maximum length of all member variable types of the structure, takeMinimum integer multiple of the maximum numberAs alignment value
  • Combined with the above two points, we can know that ifCompiler default alignment length#pragma pack(n))The default alignment length is meaningless when the maximum length of the type of member variable in the structure is exceeded

Analysis process

Next, let’s analyze what “it” has experienced and affected the “expected” results

Member variable type Offset Self occupation
a bool 0 1
byte alignment nothing 1 3
b int32 4 4
c int8 8 1
byte alignment nothing 9 7
d int64 16 8
e byte 24 1
byte alignment nothing 25 7
Total occupancy size 32

Member alignment

  • First member a
    • The type is bool
    • The size / alignment value is 1 byte
    • Initial address, offset 0. Occupied the first place
  • Second member B
    • The type is int32
    • The size / alignment value is 4 bytes
    • According to rule 1, the offset must be an integral multiple of 4. It is determined that the offset is 4, so the 2-4 bits are padding. The current value is filled from bit 5 to bit 8. As follows: axxx|bbbb
  • Third member C
    • Type is int8
    • The size / alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 8. No additional alignment is required, fill 1 byte to bit 9. As follows: axxx|bbbb|c
  • Fourth member D
    • The type is Int64
    • The size / alignment value is 8 bytes
    • According to rule 1, the offset must be an integral multiple of 8. It is determined that the offset is 16, so 9-16 bits are padding. The current value is written from bit 17 to bit 24. As follows: axxx|bbbb|cxxx|xxxx|dddd|dddd
  • Fifth member e
    • Type is byte
    • The size / alignment value is 1 byte
    • According to rule 1, its offset must be an integral multiple of 1. The current offset is 24. No additional alignment is required, fill 1 byte to the 25th bit. As follows: axxx|bbbb|cxxx|xxxx|dddd|dddd|e

Overall alignment

After each member variable is aligned, according to rule 2, the whole structure itself should also be byte aligned, because it can be found that it may not be2^n, not even times. Obviously does not comply with the rules of alignment

According to rule 2, the alignment value is 8. The offset is now 25, not an integral multiple of 8. Therefore, the offset is determined to be 32. Align structure

result

Part1 memory layout: axxx|bbbb|cxxx|xxxx|dddd|dddd|exxx|xxxx

Summary

Through the analysis in this section, we can know why the previous “calculation” was wrong?

Because the actual memory management is not the idea of “one radish, one pit”. But piece by piece. The reading and writing are completed through the idea of space for time (efficiency). In addition, we also need to take into account the memory operation of different platforms

Ingenious structure

In the previous section, we can see that the memory of its structure will produce actions such as alignment according to the types of member variables. Assuming that the field order is different, will there be any change? Let’s try it together: -)

type Part1 struct {
    a bool
    b int32
    c int8
    d int64
    e byte
}

type Part2 struct {
    e byte
    c int8
    a bool
    b int32
    d int64
}

func main() {
    part1 := Part1{}
    part2 := Part2{}

    fmt.Printf("part1 size: %d, align: %d\n", unsafe.Sizeof(part1), unsafe.Alignof(part1))
    fmt.Printf("part2 size: %d, align: %d\n", unsafe.Sizeof(part2), unsafe.Alignof(part2))
}

Output result:

part1 size: 32, align: 8
part2 size: 16, align: 8

Through the results, we can be surprised to find that simply changing the field order of member variables changes the occupied size of the structure

Next, let’s analyze it togetherPart2, what is the difference between its interior and the previous one, which leads to this result?

Analysis process

Member variable type Offset Self occupation
e byte 0 1
c int8 1 1
a bool 2 1
byte alignment nothing 3 1
b int32 4 4
d int64 8 8
Total occupancy size 16

Member alignment

  • First member e
    • Type is byte
    • The size / alignment value is 1 byte
    • Initial address, offset 0. Occupied the first place
  • Second member C
    • Type is int8
    • The size / alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 2. No additional alignment is required
  • Third member a
    • The type is bool
    • The size / alignment value is 1 byte
    • According to rule 1, its offset must be an integer multiple of 1. The current offset is 3. No additional alignment is required
  • Fourth member B
    • The type is int32
    • The size / alignment value is 4 bytes
    • According to rule 1, the offset must be an integral multiple of 4. Determine that the offset is 4, so the third bit is padding. The current value is filled from bit 4 to bit 8. As follows: ecax|bbbb
  • Fifth memberd
    • The type is Int64
    • The size / alignment value is 8 bytes
    • According to rule 1, the offset must be an integral multiple of 8. The current offset is 8. No additional alignment is required and 8 bytes are filled from 9-16 bits. As follows: ecax|bbbb|dddd|dddd

Overall alignment

In accordance with rule 2, no additional alignment is required

result

Part2 memory layout: ecax|bbbb|dddd|dddd

summary

by force of contrastPart1andPart2Memory layout, you will find that the two are very different. As follows:

  • Part1:axxx|bbbb|cxxx|xxxx|dddd|dddd|exxx|xxxx
  • Part2:ecax|bbbb|dddd|dddd

Take a closer look,Part1There are many padding. Obviously, it occupies a lot of space, so how does padding appear?

Through the introduction of this article, we can know that byte alignment is required due to different types, so as to ensure the access boundary of memory

Then it’s not hard to understand whyAdjust the field order of member variables in the structureThe question of reducing the occupied size of the structure can be achieved because the existence of padding is cleverly reduced. Make them more “compact”. This is very helpful for deepening the memory layout impression of go and the optimization of large objects

Recommended Today

How cocoapods work

Help manage and maintain third-party frameworks Simple understanding: quickly search multiple third-party frameworks, and then automatically integrate them into multiple projects. And compiled into a libpod A’s static library is for our project.The difference between pod update and pod install Pod install will refer to podfile The version number of the third-party library in the […]