Reading notes for a deeper understanding of computer systems Chapter 3 machine level representation of programs


This chapter mainly introduces the machine code in computer assembly language. When we use high-level language (C, Java, etc.) programming, the code will shield the details of the machine level, we can not understand the machine level code implementation. Now that we have a high-level language, why do we need to learn assembly language? Learning the machine level implementation of the program can help us understand the optimization ability of the compiler, and let us understand how the program runs and which parts can be optimized. When the program is attacked (vulnerabilities), it will involve the details of the control information of the program running, and many programs will use the vulnerability information in the system program to rewrite the program, so as to obtain the system’s security Control (worms exploit the vulnerability of the gets function). Especially as an embedded software development practitioners, will often contact with the underlying code implementation, such as bootloader clock initialization, relocation, etc. are implemented with assembly language. Although we are not required to use assembly language to write complex programs, we are required to be able to read and understand the assembly code generated by the compiler.



Program coding

Abstract model of computer

   beforeReading notes of “in depth understanding of computer system” (CSAPP) — Chapter 1 roaming of computer systemThe abstract model of computer is mentioned in this paper. Computer uses a simpler abstract model to hide the details of implementation. For machine level programming, whereTwo abstractions are particularly important。 The first is made up ofInstruction set architecture(instruction set architecture, ISA) to define the format and behavior of machine level programsProcessor statusFormat of instruction, andThe effect of each instruction on the state。 Most ISA, including x86-64, describe the behavior of a program asIt seems that every instruction is executed in orderAfter one instruction ends, the next begins. The hardware of the processor is much more complex than the description. They execute many instructions concurrently, but measures can be taken to ensure that the overall behavior is completely consistent with the sequential execution behavior specified by Isa. The second kind of abstraction is,The memory address used by machine level program is virtual address, and the memory model provided looks like a very large byte array。 The actual implementation of memory system is to combine multiple hardware memories and operating system software.

Registers in assembly code

The program counter (usually called “PC”, represented by the sign% rip in x86-64) gives the memory address of the next instruction to be executed.

The integer register file contains 16 named locations that store 64 bit values. These registers can store address (corresponding to the pointer of C language) or integer data. Some registers are used to record some important program states, while others are used to store temporary data, such as process parameters, local variables, and function return values.

The condition code register holds the status information of the most recently executed arithmetic or logic instruction. They are used to implement conditional changes in control or data flow, such as if and while statements

A set of vector registers can hold one or more integer or floating-point values

  As for the registers commonly used in assembly, I suggest to see the arm part of the interview knowledge of embedded software development, which introduces the registers and instruction sets commonly used in arm in detail.

Machine code example

If we have a main. C file, using gcc – 0g – s main. C can produce an assembly file. Then use GCC – 0g – C main. C to generate the object code file main. O. Usually, this. O file is in binary format and cannot be viewed directly. We can open the editor and adjust it to hexadecimal format. The example is as follows.

53 48 89 d3 e8 00 00 00 00 48 89 03 5b c3

   this is the assembly instruction corresponding toObject code。 We can get an important message from it, that isA program executed by a machine is just a sequence of bytes, which encodes a series of instructions. The machine knows little about the source code that generated these instructions.

Introduction to disassembly

To see the contents of a machine code file, there is a class of programs called disassemblers that are very useful. These programs generate a format similar to assembly code based on machine code. In Linux system, using objdump – D main. O can generate disassembly file. An example is shown below.


On the left, we see 14 hexadecimal byte values arranged according to the byte order given earlier, which are divided into several groups with 1-5 bytes in each group. Each group is an instruction, and on the right is the equivalent assembly language

   some of them are about machine code and its functionsThe characteristics of disassembly representation are noteworthy

  • The instruction length of x86-64 ranges from 1 to 15 bytes. Common instructions and instructions with fewer operands require less bytes, while those with less common or more operands require more bytes

  • Instruction format is designed in such a way that starting from a given location, bytes can be uniquely decoded into machine instructions. For example, only the instruction push% RBX starts with a byte value of 53

  • The disassembler only determines the assembly code based on the sequence of bytes in the machine code file. It does not require access to the source code or assembly code of the program

  • There are some subtle differences between the naming rules used by the disassembler and the GCC generated assembly code. In our example, it omits the “Q” at the end of many instructions. These suffixes are size indicators and can be omitted in most cases. In contrast, the disassembler adds’ Q ‘suffixes to the CA11 and RET instructions, and again, it’s OK to omit them.

data format

Intel uses the term “word” to denote a 16 bit data type. Therefore, 32-digit is called “double words” and 64 digit is called “quad words”. The following table shows the x86-64 representation of the basic data type of C language.

Statement C Intel data type Assembly code suffix Size (bytes)
char byte b 1
short word w 2
int Double character l 4
long Four characters q 8
char* Four characters q 8
float Single precision s 4
double Double precision 1 8

Access information

Operands indicator

Integer register

The register names of different bits are different, so pay attention to them when using them.


Three types of operands

An immediate number, used to represent a constant value, for example,$0x1f。 Different instructions allow different ranges of immediate values, and the assembler will automatically select the most compact way for numerical coding.

   2. Register, which represents the content of a register. One of the lower 1-byte, 2-byte, 4-byte or 8-byte of the 16 registers is used as the operands. These bytes correspond to 8-bit, 16 bit, 32-bit or 64 bit respectively. In Figure 3-3, we use symbols\({r_a}\)To represent any register a by reference\(R[{r_a}]\)To represent its value, this isThe register set is regarded as an array R, and the register identifier is used as the index

   3. Memory reference, which will access a memory location according to the calculated address (usually called effective address). Because we think of memory as a large array of bytes, we use symbols\({M_b}[Addr]\)Represents a reference to a b-byte value stored in memory starting from the address addr. For simplicity, we usually omit the subscript B.

Format of operands

When looking at the assembly instructions, you can read most of the assembly code according to the figure below.


Data transfer instruction


The main difference between instructions with different suffixes is that they operate on different data sizes.

Source operands: registers, memory

Destination operands: register, memory.

Note: both operands of the transfer instruction cannot point to a memory location. Copying a value from one memory location to another requires two instructions – the first loads the source value into a register, and the second writes the register value to the destination.

movl $0x4050,%eax         Immediate--Register,4 bytes p,1sp  move 
movw %bp,%sp              Register--Register, 2 bytes
movb (%rdi. %rcx),%al     Memory--Register  1 bytes
movb $-17,(%rsp)          Immediate--Memory 1 bytes
movq %rax,-12(%rpb)       Register--Memory, 8 bytes

To copy a smaller source value to a larger destination, use the following command.



give an example


The procedure parameters XP and y are stored in registers% RDI and% RSI respectively (the parameters are passed to the function through the register).

   second line:Instruction movqRead XP from memory and store it in register%raxLocal variables like x are usually stored in registers, not in memory.

   third line:Instruction movqWrite y to register%rdiThe memory location pointed to by XP in.

   fourth line:Instruction retUse register%raxReturns a value from this function.


Indirect reference pointer is to put the pointer in a register, and then use the register in memory reference.

Local variables like x are usually stored in registers rather than in memory. Accessing registers is much faster than accessing memory.

Push and pop stack data


The function of pushq instruction is to push data into the stack, while POPQ instruction is to pop up data. All of these instructions have only one operand — the data source to be pushed in and the data destination to be popped out.

Pushq% RBP is equivalent to the following two instructions:

subq $8,%rsp             Decrement stack pointer
movq %rbp,(%rsp)       Store %rbp on stack

POPQ% rax is equivalent to the following two instructions:

mova (%rsp), %rax        Read %rax from stack 
addq $8,%rsp             Increment stack pointer

Arithmetic and logic operation

Load valid address

There is such a load valid address instruction in the   IA32 instruction setlealThe usage isleal S, DThe effect is to store the address of s into D, which is a variant of MOV instruction. But this instruction is often used to calculate multiplication. Gcc compiler especially likes to use this instruction, such as the following example

leal (%eax, %eax, 2), %eax

   implements functions equivalent to%eax = %eax * 3。 In brackets is a kind of proportional index addressing, which takes the product of the first number plus the second number and the third number as the address addressing,lealThe effect is that the source operand is exactly the address obtained by addressing, and then it is assigned to the%eaxRegister. Why do you use multiplication in this way instead of using multiplication instructionsimulWhat about it?

   this is because Intel processor has a special address operation unit, so that the execution of Lear does not have to go through ALU, and only needs a single clock cycle. Compared toimulIt’s much faster for me. Therefore, the compiler will use it when most multipliers are small constantslealComplete the multiplication operation.

Unary and binary operations
address value
0x100 0xFF
0x108 0xAB
0x110 0x13
0x118 0x11
register value
%rax 0x100
%rcx 0x1
%rdx 0x3

Take an example to understand the meaning of these instructions. If you don’t know the meaning of these instructions, you can see the format of common assembly instructions summarized in the section of format of operands.

instructions objective value explain
addq %rcx,(%rax) 0x100 0x100 Add the value of RCX register (0x1) to% rax address (0xff)
subq %rdx,8(%rax) 0x108 0xA8 Take value (0xab) from address 8 (% rax) and subtract value of% RDX (0x3)
imulq $16,(%rax,%rdx,8) 0x118 0x110 (0x100 + 0x3 * 8) = 118. Take the value from the address of 118 and multiply it by 10 (16), the result is 0x110
incq 16(%rax) 0x110 0x14 %rax + 16 = 0x100+10 = 0x110。 Take the value of 0x13 from 0x110, and the result + 1 is 0x14.
decq %rcx %rcx 0x0 0x1-1
Shift operation

Shift left instruction: Sal, SHL

Arithmetic shift right instruction: SAR (fill in sign bit)

Logical shift right instruction: SHR (fill in 0)

The destination of a shift operation is a register or a memory location. One hundred and sixty-nine


Assembly code corresponding to C language




Condition code

Definition of condition code

   describes the properties of the most recent arithmetic or logical operation. These registers can be detected toExecute conditional branch instruction

Common condition codes

   CF: carry flag. The most recent operation results in carry in the highest bit. Can be used to check for overflow of unsigned operations.

   ZF: Zero flag. The result of the most recent operation is 0.

   SF: symbol mark. The result of the most recent operation is negative.

   of: overflow flag. The most recent operation resulted in a complement overflow – positive or negative.

Instruction to change condition code


The CMP instruction sets the condition code according to the difference between two operands. It is often used to compare two operands, but it does not change the operands.

The test instruction is used to test whether the number is positive or negative, zero or non-zero. The two operands are the same

Test% rax,% rax / / check whether% rax is negative, zero, or positive (% rax &% rax)

CMP% rax,% RDI / / similar to sub instruction,% RDI -% rax.


In the above table, except for the leap instruction, other instructions will change the condition code.

For shift operations, the carry flag is set to the last bit moved out, and the overflow flag is set to 0. The Inc and dec instructions set overflow and zero flags.

Access condition code

Three ways to access condition code

1. A byte can be set to 0 or 1 according to some combination of condition codes.

2. You can conditionally jump to some other part of the program.

3. Data can be transferred conditionally.

For the first case, the set instruction is often used to set, as shown in the figure below.


/ *

Setl% Al when a

Jump command


Some of the instructions in the above table have suffixes, which indicate conditional jump. The following explains these suffixes, which are helpful for memory.

e == equal,ne == not equal,s == signed,ns == not signed,g == greater,ge == greater or equal,l == less,le == less or eauql,a == ahead,ae == ahead or equal,b == below,be == below or equal

Jump directly

JMP. L1 / / give the label directly and jump to the label

Indirect jump

JMP *% rax / / uses the value in register% rax as the jump target
Code of jump instruction

By looking at the code format of jump instruction, we can understand how the program counter PC realizes jump.


movq %rdi, %rax 
jmp .L2
sarq %rax 
testq %rax, %rax 
jg .L3


0:48 89 f8      mov %rdi,%raxrdi, 
3:eb 03         jmp 8 
5:48 d1 f8      sar %rax
8:48 85 c0      test %rax %rax
b:71 f8         jg 5d: f3 C3        repz rete

In the comments generated by the right anti assembler, the jump target of the jump instruction in line 2 is specified as 0x8, and the jump target of the jump instruction in line 5 is specified as 0x5 (the anti assembler gives all the numbers in hexadecimal format). However, if you look at the node code of the instruction, you can see that the target code of the first jump instruction (in the second byte) is 0x03. Add it with 0 × 5, that is, the address of the next instruction, and you will get the jump target address 0x8, that is, the address of the instruction on line 4.

Similarly, the target of the second jump instruction is represented by a single byte, complement, and encoded as 0xf8 (decimal-8). Add this number to 0xa (decimal 13), which is the address of instruction line 6, and we get 0x5, which is the address of instruction line 3.

   these examples show that when PC relative addressing is performed,The value of the program counter is the address of the instruction following the jump instruction, not the address of the jump instruction itself

Conditional control realizes conditional branching


The figure above shows three forms of C language, goto representation and assembly language. The goto statement is used here to construct a C program that describes the control flow of an assembly code program.

The implementation of assembly code (Figure 3-16c) first compares two operands (line 2) and sets the condition code. If the result of the comparison shows that x is greater than or equal to y, it will jump to line 8 and add the global variable Ge_ CNT, calculate X-Y as the return value and return. From this we can see absdif_ The control flow of Se corresponding to assembly code is very similar to gotodiff_ Goto code for se.

The general template of if else in C language is as follows:


The corresponding assembly code is as follows:


Conditional transfer realizes conditional branch


The assembly code generated by     GCC for this function is shown in figure 3-17c, which has a similar form to the C function cmovdiff shown in figure 3-17b. Studying the C version,We can see that it calculates both Y-X and X-Y, named rval and eval, respectively。 Then it tests whether x is greater than or equal to y. if so, it copies Eval into rval before the function returns to rval. The assembly code in figure 3-17c has the same logic. The key is the assembly codeCmovge instruction(line 7) implements the conditional assignment of cmovdiff (line 8).Only when the cmpq instruction on line 6 indicates that one value is greater than or equal to another value (as indicated by the suffix GE), will the data source register be transferred to the destination

The assembly template of conditional control is as follows:


   in fact, code based on conditional data transfer performs better than code based on conditional control transfer. The main reason is that the processor uses pipelining to achieve high performance, and the processor uses very precise hardwareBranch prediction logicTo guess whether each jump instruction will be executed. As long as its guess is reliable (modern microprocessor design attempts to achieve more than 90% success rate), the instruction pipeline will be full of instructions. On the other hand, mispredicting a jump requires the processor to discard all the work it has done for all the instructions after the jump instruction, and then start to fill the pipeline with the instructions starting from the correct position. Such a wrong prediction would lead to serious punishment,About 15 ~ 30 clock cycles are wasted, which leads to the serious performance degradation of the program

   using conditional delivery does not always improve code efficiency. For example, if the evaluation of then expr or else exprIt takes a lot of calculationWhen the corresponding conditions are not met, the work will be wasted. Compiler mustThe relative performance between wasted computation and performance penalty due to branch prediction error is considered。 To be honest, compilers don’t have enough information to make reliable decisions; for example, they don’t know how well branches follow predictable patterns. Our experiments on GCC show that conditional delivery is only used when both expressions are easy to compute, for example, the expressions are only adders. In our experience, GCC uses conditional control transfers even though many branch mispredictions cost more than more complex calculations.

Therefore, in general, conditional data transfer provides an alternative strategy to implement conditional operation by conditional control transfer. They can only be used in very limited situations, but these situations are quite common, and they are more compatible with the way modern processors run.


   there are two main ways to translate loops into assemblies. The first one is what we call assemblyJump to the middleIt performs aUnconditional jumpJump to the test at the end of the loop to perform the initial test. The second method is calledguarded-doIf the initial condition does not hold, skip the loop and transform the code into a do while loop. GCC will adopt this strategy when compiling with higher optimization level, such as command line option-o1.

Jump to the middle

As shown in the figure below, the code for calculating factorial is written in the while loop. You can see that the compiler uses the translation method of jumping to the middle. In line 3, jump to the test starting with label L5 with JMP. If n meets the requirements, execute the loop, otherwise exit.



The following figure shows the assembly code compiled by the second method. When compiling, it uses – O1, and GCC will compile the loop in this way.


   the above two compilation modes are while loop and do while loop,According to different GCC optimization results, different assembly codes will be obtained。 In fact, the assembly code generated by the for loop is one of the above two kinds of assembly code. The general form of the for loop is as follows.


If you choose to jump to the middle policy, you will get the following goto Code:


The guarded do policy will get the following goto Code:


Suit statement

The   switch statement can be based on an integer index valueMultiple branches。 They not only improve the readability of C code, but also through the use ofJump tableThis data structure makes the implementation more efficient. The jump table is aarrayThe table item I is the address of a code segment that implements the action that the program should take when the switch index value equals I.

   the program code uses the switch index value to perform the operations in a jump tableArray reference to determine the target of jump instruction。 Compared with if else statements with long groups,The advantage of using jump tables is that the time to execute switch statements is independent of the number of switch cases。 GCC translates switch statements according to the number of switch cases and the sparsity of switch case values. When the number of switches is large (for example, more than 4) and the range of values is small, the jump table will be used.


   the original C code is for the values of 100, 102104 and 106, but the switch variable n can be any integer. The compiler starts withN minus 100, putThe value range is between 0 and 6To create a new program variable called index in our c version. The negative number represented by complement code will be mapped to the large positive number represented by no sign. Taking advantage of this fact, index is regarded as no sign value, which further simplifies the possibility of branching. So it can pass the testIndex is greater than 6 to determine whether the index is outside the range of 0 ~ 6。 In C and assembly code, according to the value of index, there are five different jump positions: LOC_ A(.L3),loc_ B(.L5),loc_ C(.L6),loc_ D (. L7) and loc_ Def (. L8), the last is the default destination address. Each label identifies a code block that implements a branch of a situation. In C and assembly code,All programs compare index with 6. If it is greater than 6, jump to the default code


   the key steps to execute the switch statement areAccess code location through jump table。 In line 16 of the C code, a goto statement refers to the jump table JT. GCC supports computing goto, which is an extension of C language. In our assembly code version, a similar operation is on line 5. The operands of the JMP instruction are prefixed with ‘*’, indicating that this is an errorIndirect jump. The operand specifies a memory locationThe index is given by register% RSI, which holds the value of index.

The     C code declares the jump table as an array of seven elements, each of which is a pointer to the code position. These elements span the values of index from 0 to 6, corresponding to the values of N from 100 to 106. It can be observed that the jump table is the best way to deal with repetitionSimply label table items 4 and 6 with the same code (LOC)_ D)And the way to deal with the missing situation isUse the default label (LOC) for table items 1 and 5_ def)

In assembly code, the jump table is declared as follows


(. Rodata is explained in detail in the interview knowledge of embedded software development written test I summarized.)

Given the switch assembly code, how to infer the C language structure of switch from the structure of assembly language and jump table?

About the switch statement of C language, we need to focus on the size of jump table, jump range, those cases are missing, those are repeated. Let’s make sure one by one.

As you can see from line 1 of the assembly in Figure 3-23, the starting count of n is 100. It can be seen from the second line that comparing variables with 6 shows that the index offset range of jump table is 0 ~ 6, corresponding to 100 ~ 106. Starting from. Quad. L3, it is numbered 0, 1, 2, 3, 4, 5, 6 from top to bottom. It can be seen from ja. L8 in Figure 3-23 that when it is greater than 6, it will jump to. L8, and the numbers 1 and 5 in the jump table are the default positions of jump. Therefore, the numbers 1 and 5 are missing, that is, there are no options for 101 and 105. The numbers 4 and 6 jump to. L7, indicating that they correspond to 100 + 4 = 104100 + 6 = 106. The remaining cases 0, 2 and 3 are numbered 100102103 in turn. So far, we have got the number of switch, a total of 6 items, 100102103104106, default. The rest of the C language content about each case can be written according to the assembly code.


Runtime stack

A key feature of procedure call mechanism in     C language (as in most other languages) is the use ofThe last in, first out provided by stack data structureThe principle of memory management. If procedure P calls procedure Q, we can see that when q is executing, P and all the procedures in the call chain tracing up to P are temporarySuspendedYes. When Q runs, it only needs to be a local variableAllocate new storage space, or set a call to another procedure. On the other hand, when Q returns, any of theLocal storage space can be released。 Therefore,Program can use stack to manage the storage space needed by its process. Stack and program register store the information needed to transfer control and data and allocate memory。 When p calls Q, control and data information are added to the end of the stack. When p returns, the information is released.


The stack of     x86-64 grows towards the low address, and the stack pointer number% RSP points to the top element of the stack. Pushq and POPQ instructions can be used to store or remove data from the stack.Reducing the stack pointer by an appropriate amount can allocate space on the stack for data without specified initial value. Similarly, you can free up space by increasing the stack pointer.

The     procedure P can pass up to six integer values (that is, pointers and integers), but if Q needs more parameters, P can call Q before its ownStack frame (memory)Store these parameters in the database.

Transfer control

   to transfer control from a function to a function Q, simply transfer theThe program counter (PC) is set to the starting position of the Q code。 However, when it returns from Q later, the processor must record where it needs to continue the execution of P. On x86-64 machines, this information is recorded by calling procedure Q with the instruction call Q. This instruction willAddress a is pushed into the stack, andSet the PC to the starting address of Q。 The pressed in address a is calledReturn address, is the address of the instruction immediately following the call instruction. The corresponding instruction RET will pop up address a from the stack and set PC to a.


Here’s an example



Main calls top (100), and then top calls leaf (95). The function leaf returns 97 to top, and then top returns 194 to main. The first three columns describe the instructions to be executed, including instruction label, address, and instruction type. The last four columns show the state of the program before the instruction is executed, including the contents of registers% RDI,% rax and% RSP, and the values at the top of the stack.

The instruction L1 of leaf sets% rax to 97, which is the value to return. Then instruction L2 returns, which ejects 0 × 400054e from the stack. By setting the PC to the pop-up value, the T3 command is transferred back to top. The program successfully calls leaf and returns to top.

Instruction T3 sets% rax to 194, which is the value to return from top. Then the instruction T4 returns, it ejects 0 × 4000560 from the stack, so set the PC to the M2 instruction of main. The program successfully calls top and returns to main. You can see that at this time, the stack pointer is also restored to 0x7fffe820, that is, the value before calling top.

This simple mechanism of pushing the return address onto the stack allows the function to return to the correct point in the program later。 The call / return mechanism of the C language standard just coincides with the last in, first out memory management method provided by the stack.

data transfer

In x86-64, up to six parameters can be passed through registers. Registers are used in a special order. As shown in the table below, registers are allocated according to the order of parameters.


When there are more than 6 parameters passed, the parts larger than 6 will be put on the stack.

As shown in the figure below, the parameters in the red box are stored on the stack.


Local storage on stack

   in general, there is no need for a local storage area that exceeds the register size. But sometimes,Local data must be stored in memoryThe common cases include: 1. The register is not enough to store all the local data.

2. Use the address operator ‘&’ for a local variable, so you must be able to generate an address for it. 3. Some local variables are arrays or structures, so they must be accessible through array or structure references.

Here’s an example.



The subq instruction on the second line subtracts 32 bytes from the stack pointer, which actually allocates 32 bytes of memory space. Based on the stack pointer, + 24, + 20, + 18, + 17 are used to store the values of 1,2,3,4. In line 7, use leaq to generate a pointer to 17 (% RSP) and assign it to% rax. Next, store parameter 7 and parameter 8 in + 8 and + 16 positions based on the stack pointer. Parameter 1 – parameter 6 are put in six registers respectively. The structure of stack frame is shown in the figure below.


Lines 2-15 of the above assembly are all in preparation for calling proc (establishing stack frames for local variables and functions, and loading functions into registers). When the preparation is complete, the code of proc will be executed. When the program returns call_ Proc, the code will take out four local variables (lines 17-20) and perform the final calculation. At the end of the program, add 32 to the stack pointer to release the stack frame.

Local storage in registers

Register group is the only resource shared by all processes. Therefore, in some calling procedures, the registers of different procedure calls should not affect each other.

   by convention, registers% RBX,% RBP and% R12 to% R15 are divided intoCallee save register。 When procedure P calls procedure Q, Q must hold the values of these registers,Make sure that their values are the same when Q returns to P as when Q was called。 Procedure Q keeps the value of a register unchanged, or does not change it at all, or pushes the original value into the stack. With this Convention, the code of P can safely store the value in the register of the callee (of course, save the previous value to the stack), call Q, and then continue to use the value in the register.

Here’s an example.


   you can see that GCC generated code uses twoCallee save register:% RBP holds X and% RBX holds the calculated value of Q (y). At the beginning of the function, save the values of these two registers on the stack (lines 2-3). Before calling Q for the first time, copy parameter x to% RBP (line 5). Before the second call to Q, copy the result of this call to% RBX (line 8). At the end of the function (lines 13-14), pop them out of the stack and restore the values of the two callees. Pay attention to the order in which they pop in and out, indicating the last in, first out rule of the stack.

Recursive procedure

According to the previous content, multiple procedure calls have their own private space in the stack, the local variables of multiple unfinished calls will not affect each other, and recursion is essentially the mutual call of multiple procedures. This is a recursive call to compute factorial.


   the figure above shows the C code of recursive factorial function and the generated assembly code. As you can see, the assembly code uses the register% RBX to save the parameter n, first saves the existing value on the stack (line 2), and then restores the value (line 11) before returning. According to the characteristics of the stack and the rules of register saving, when refact (n-1) returns (line 9), the,(1) The result of this call will be saved in the register number% rax, (2) the value of parameter n is still in the register% RBX。 Multiply these two values to get the desired result.

Array allocation and access

basic principle

At the machine code level, there is no more advanced concept of array, but you treat it as a set of bytes. These sets of bytes are stored in continuous locations, and so is the structure. It is allocated as a set of bytes. Then, the job of C compiler is to generate appropriate code to allocate the memory, So when you refer to an element of a structure or array, you get the correct value.

   data type T and integer constant n, declare an array TA [n]. The starting position is expressed as\({X_A}\)This statement has two effects. First, it allocates one in memory\(L \bullet N\)A contiguous region of bytes, where l is the size (in bytes) of data type T. Secondly, it introduces the identifier a, which can be used as a pointer to the beginning of the array. The value of this pointer is\({X_A}\)。 The array element can be accessed by integer index of 0 ~ n-1. The array element I will be stored in the\({X_A} + L \bullet i\)It’s a good place.

char A[12];

char *B[8];

char C[6];

char *D[5];

array Element size Total size Starting address Element I
A 1 12 \({X_A}\) \({X_A}+i\)
B 8 64 \({X_B}\) \({X_B}+8i\)
C 4 24 \({X_C}\) \({X_C}+4i\)
D 8 40 \({X_D}\) \({X_D}+8i\)
Pointer operation

Suppose that the starting address of integer array E and integer index I are stored in registers% RDX and% RCX respectively. Here are some expressions related to E. We also give the assembly code implementation of each expression, and the result is stored in register number% eax (if it is data) or register number% rax (if it is pointer).


Two dimensional array

   for a two-dimensional array declared as t d [R] [C], the memory address of Array D [i] [J] is\({X_D} + L(C \bullet i + j)\)

   here, l is the size of data type T in bytes. hypothesis\({X_A}\), I and j are in registers% RDI,% RSI and% RDX, respectively. Then, you can copy the array element a [i] [J] into the register% eax with the following code:

/*A in %rdi, i in %rsi, and j in %rdx*/ 
leaq (%rsi,%rsi,2), %rax //Compute 3i
leaq (%rdi,%rax,4),%rax //Compute XA+ 12i 
movl (7rax, rdx, 4), %eax //Read from M[XA+ 12i+4j]

Heterogeneous data structure

structural morphology

   C language struct declaration to create a data type, will be possibleDifferent types of objectsAggregate into an object. All components of the structure are stored in a continuous area of memory, and the pointer to the structure is the address of the first byte of the structure. The compiler maintains information about each structure type, indicating the byte offset of each field. It uses these offsets as the offsets in the memory reference instruction to generate references to structure elements.

   structures are stored in memory in the form of offset. See this article for details.Container in Linux kernel_ Detailed explanation of macro

struct rec {
	int i;
	int j;
	int a[2];
	int *p;

The structure consists of four fields: two 4-byte ints, an array of two elements of type int, and an 8-byte integer pointer, which is 24 bytes in total.


Looking at the assembly code, we can also see that the access of structure members is the way of base address plus offset address. For example, suppose a variable r of type struct rec * is placed in register% RDI. Then the following code copies the element R – > I to the element R – > J:

/ * Registers:r in  %rdi,i %rsi */
Data alignment

For the related content of byte alignment, please refer to the summary of interview knowledge points of embedded software written test that I sorted out, which introduces the related content of byte alignment in detail.

Combine control and program in machine level program

Understanding pointers

Some notes on Pointer:

1. Each pointer corresponds to a type

Int * IP; / / IP is a pointer to an object of type int

2. Each pointer has a value. This value can be the address of an object of a specified type or a special null (0).

3. Pointer is created with & operator. In assembly code, the leaq instruction is used to calculate the address of memory reference.

int i = 0;

4. The * operator is used to refer to pointers indirectly. The result of the reference is a concrete value whose type is the same as that of the pointer.

5. Array and pointer are closely related, but different.

int a[10] ={0};

The name of an array can be referenced (but not modified) like a pointer variable. Array references, such as a [5], have the same effect as pointer operations and indirect references, such as * (a + 5).

Both array references and pointer operations require the offset to be determined by the size of the objectStretch and stretch。 When we write the expression a + I, where the value of pointer P is a, the resulting address is evaluated as a + L * I, where l is the size of the data type associated with a.

Array namecorrespondingThis is a memory address that cannot be modified. PointerpointIs any piece of memory, its value can be modified at will.

Cast a pointer from one type to another,Change only its type, not its value。 One effect of casting isChanging the scaling of pointer operations。 For example, if a is a pointer of type char *, its value is a, a + 7 results in a + 7 * 1, and the expression (int *) P + 7 results in P + 4 * 7.

Memory out of bounds reference

   C does not perform any boundary checking for array references, andLocal variables and state information(for example, the saved register value and return address) are stored in the stack. The combination of these two situations can lead to serious program errors, and the writing of out of bounds array elements will cause serious errorsDestroy the state information stored in the stack。 When a program uses this destroyed state, it will have serious errors. A particularly common state violation is called buffer overflow.



In the above C code, buf only allocates the size of 8 bytes, and any more than 7 bytes will make the array out of bounds.

Different errors will occur when inputting different numbers of strings. Please refer to the figure below for details.


The stack distribution of echo function is shown in the figure below.


   strings up to 23 characters have no serious consequences, but after that, the value of the return pointer and more possible save state will be destroyed. If the stored value of the return address is corrupted, the RET instruction (line 8) causes the program to jump to aCompletely unexpected location。 If you only look at the C code, it is impossible to see the above behaviors. Only by studying machine code level programs can we understand the impact of memory overrun writing by functions like gets.

Floating point code

   the floating-point number in computer can be said to be an “alternative” existence. Every time the data related content is mentioned, the floating-point number is always taken out separately. Similarly, floating point numbers are different from other types of data in assembly, so we need toConsider the following aspects: 1. How to store and access floating point values. It is usually accomplished by some register mode. 2. Instructions for floating-point data operation. 3. Rules for passing floating-point parameters to functions and returning floating-point results from functions. 4. Rules for saving registers during function calls – for example, some registers are designated as callers, while others are designated as callees.

X86-64 floating-point numbers are based on SSE or AVX, including rules for passing process parameters and return values. Here, we are talking about avx2. When compiling with GCC and – mavx2, GCC will generate avx2 code.

As shown in the figure below, AVX floating-point architecture allows data to be stored in 16 YMM registers named% ymm0 to% ymm15. Each YMM register is 256 bits (32 bytes). When operating on scalar data, these registers hold only floating-point numbers and use only the lower 32 bits (for float) or 64 bits (for double). The assembly code uses the SSE XMM register name% xmm0 ~% xmm15 of the register to refer to them. Each XMM register is the lower 128 bits (16 bytes) of the corresponding YMM register.


In fact, floating-point assembly instructions and integer instructions are almost the same, do not need to remember, when used to query it.

Data transfer instruction


Double operand floating point conversion instruction


Three operands floating point conversion instruction


Scalar floating point arithmetic operation


Bit level operation of floating point numbers


Instructions for comparing floating point values


   in this chapter, we’ve learned something below the abstraction layer provided by C. By allowing the compiler to produce assembly code representation of machine level programs, we learn about the compiler and its optimization capabilities, as well as the machine, data types, and instruction sets. This chapter requires us to be able to read and understand the machine level code generated by the compiler. Machine instructions don’t need to be remembered, just look them up when necessary. Arm instruction set and x86 instruction set are similar, do embedded software development, master commonly used arm instruction set can. Embedded software development knowledge points are introduced in detail the common Arm instruction set and its meaning, and can be paid attention to when I need to receive official account.

Form a habit, praise first, then watch! If you think it is well written, you are welcome to pay attention to it, like it and forward it. Thank you!

If you encounter the problem of typography disorder, you can visit my CSDN through the following link.

CSDN:CSDN search “embedded and Linux”

Welcome to my official account: embedded and Linux, collect the autumn test written interview interview (HUAWEI millet and other big factory noodles, embedded knowledge points, written questions, resume template, etc.) and 2000G learning materials.