What happens after a function is called?

Time:2022-1-15

Assumptions:

  • AMD64 Linux
  • C/C++

First of all, we don’t need to talk about too many concepts. Just review a few basic registers:

  • %rsp: save stack top pointer
  • %rbp: save stack bottom pointer
  • %rbp~%rspThis downward stretch is the stack frame.
  • %rip: the address where the next instruction is saved
  • %rdi: saves the first parameter of the function
  • %rsi: saves the second parameter of the function
  • %rax: save return value

Then, look directly at the code!

sample program

It is assumed that the procedure is as follows:

int sum(int x, int y)
{
    return a + b;
}
int main(int argc, char const *argv[])
{
    int a = 1, b = 2;
    int c = sum(a, b);
    return 0;
}

usegcc -g prog.c -o progCompile.

The assembly code is as follows:

int sum(int x, int y)
{
    1125:    55                       push   %rbp
    1126:    48 89 e5                 mov    %rsp,%rbp
    1129:    89 7d fc                 mov    %edi,-0x4(%rbp)
    112c:    89 75 f8                 mov    %esi,-0x8(%rbp)
    return a + b;
    112f:    8b 55 fc                 mov    -0x4(%rbp),%edx
    1132:    8b 45 f8                 mov    -0x8(%rbp),%eax
    1135:    01 d0                    add    %edx,%eax
}
    1137:    5d                       pop    %rbp
    1138:    c3                       retq

0000000000001139 <main>:
int main(int argc, char const *argv[])
{
    1139:    55                       push   %rbp
    113a:    48 89 e5                 mov    %rsp,%rbp
    113d:    48 83 ec 20              sub    $0x20,%rsp
    1141:    89 7d ec                 mov    %edi,-0x14(%rbp)
    1144:    48 89 75 e0              mov    %rsi,-0x20(%rbp)
    int a = 1;
    1148:    c7 45 fc 01 00 00 00     movl   $0x1,-0x4(%rbp)
    int b = 2;
    114f:    c7 45 f8 02 00 00 00     movl   $0x2,-0x8(%rbp)
    int c = sum(a, b);
    1156:    8b 55 f8                 mov    -0x8(%rbp),%edx
    1159:    8b 45 fc                 mov    -0x4(%rbp),%eax
    115c:    89 d6                    mov    %edx,%esi
    115e:    89 c7                    mov    %eax,%edi
    1160:    e8 c0 ff ff ff           callq  1125 <sum>
    1165:    89 45 f4                 mov    %eax,-0xc(%rbp)
    return 0;
    1168:    b8 00 00 00 00           mov    $0x0,%eax
}

Execution process

We go directly frommainRead it. Please be carefulPay attention to the change of call stack

0000000000001139 <main>:
int main(int argc, char const *argv[])
{
    1139:    55                  push   %rbp             #
    113a:    48 89 e5            mov    %rsp,%rbp        #
    113d: 48 83 EC 20 sub $0x20,% RSP # this code saves the local variables argc and argv to memory
    1141: 89 7d EC mov% EDI, - 0x14 (% RBP) # we just need to note that RBP opens 0x20 down, that is, 32
                                                        #Bytes to store some local variables
    1144:    48 89 75 e0         mov    %rsi,-0x20(%rbp) #

    int a = 1;                                          #
    1148:    c7 45 fc 01 00 00 00 movl  $0x1,-0x4(%rbp)  #
    int b = 2;                                          #
    114f:    c7 45 f8 02 00 00 00 movl  $0x2,-0x8(%rbp)  #

The above code actually saves the context of the main function to memory.

Why save it? Because now the parameters argc and argv are stored in registers, the local variable is still an immediate number. If we don’t save it, when we call sum and come back, the value of field data such as registers will change.

#
    int c = sum(a, b);                                  #    Let's start here
    1156:    8b 55 f8            mov    -0x8(%rbp),%edx  #   \
    1159: 8b 45 FC mov - 0x4 (% RBP),% eax # \ this part of the code stores the parameters a and B in RDI and RSI
    115C: 89 D6 mov% EDX,% ESI # / ready to call sum function
    115e:    89 c7               mov    %eax,%edi        #   /
                                                        #
    1160: E8 C0 FF callq 1125 < sum > # callq is equivalent to:
                                                        #     pushq %rip
                                                        #     jmpq <sum>
                                                        #Pushq% Rip is equivalent to
                                                        #                 sub $0x8, %rsp
                                                        #                 movq %rip, (%rsp)
                                                        #
                                                        #The stack frame after callq is:
                                                        #    +-------+
                                                        #    |main_val| <--- rbp
                                                        #    |  ...  |
                                                        #    |  ...  |
                                                        #    |  ...  |
                                                        #    |main_val|
                                                        #    |  1165 | <--- rsp
                                                        #    |       |
                                                        #The address under 1165 instructions
                                                        #

The above code is to prepare for calling the sum function. First, prepare the parameters, and then save the next instruction(%rip)Data. In this way, after we call it, we can read it from memory%ripTo continue the program.

becausecallqThe function of the command, we jump to1125Where:

0000000000001125 <sum>:

int sum(int x, int y)
{
                                                        #Before formally executing the function, the contents of the stack are:
                                                        #   +-------+
                                                        #   |main_val| <--- rbp (main_rbp)
                                                        #   |  ...  |
                                                        #   |  ...  |
                                                        #   |  ...  |
                                                        #   |main_val|
                                                        #   |  1165 | <--- rsp
                                                        #   |       |
                                                        #
    1125: 55 push%rbp # this step, RBP is the same as in main, because there is no
                                                        #Modified. After pushing it into the stack, record its value as main_ rsp.
                                                        #% RBP into the stack, and the stack content becomes:
                                                        #   +--------+
                                                        #   |main_val|
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |main_val|
                                                        #   |  1165  |
                                                        #   |main_rbp| <--- rsp
                                                        #
    1126: 48 89 E5 mov% RSP,% RBP # here, the value of RBP becomes RSP, and RBP already belongs to the new function
                                                        #Stack contents become:
                                                        #   +--------+
                                                        #   |main_val|
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |main_val|
                                                        #   |  1165  |
                                                        #   |main_rbp| <--- rsp, rbp

Both instructions are routine operations:

  1. Save the bottom of the stack of the previous function to memory.
  2. Create your own stack bottom.
#Here, RBP allocates 8 bytes down,
                                                        #To save local variables (x, y)

    1129: 89 7d FC mov% EDI, - 0x4 (% RBP) # stack contents become:
    112c:    89 75 f8            mov    %esi,-0x8(%rbp)  #   +--------+
                                                        #   |main_val|
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |main_val|
                                                        #   |  1165  |
                                                        #   |main_rbp| <--- rsp, rbp
                                                        #   |   x    |
                                                        #   |   y    |

Here is also a normal operation, saving the values of the two parameters to the stack.

return a + b;                                       #
    112f:    8b 55 fc            mov    -0x4(%rbp),%edx  # \
    1132: 8b 45 F8 mov - 0x8 (% RBP),% eax # > here, use EDX and eax to temporarily store parameters a and B respectively.
    1135: 01 d0 add% EDX,% eax # / call the machine instruction add to add them, and save the results to eax
}

The above sentences are readyaddThe parameters of the machine instruction are then invoked.addThe command completes the operation.

#Eax is the return value register specified by Linux
                                                        #
    1137: 5D pop%rbp# here, pop up the value in the stack and put it in the RBP register. because
                                                        #The value pointed to by the stack pointer RSP is the RBP of main, so RBP
                                                        #Restore to the original value, and RBP will point back to the original position
                                                        #   +--------+
                                                        #   |main_val| <--- rbp (main_rbp)
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |  ...   |
                                                        #   |main_val|
                                                        #   |  1165  | <--- rsp
                                                        #   |        |
                                                        #|| here, the value of X and y has been discarded.
                                                        #   |        |
                                                        #
    1138: C3 retq # retq instruction is equivalent to POPQ% rip
                                                        #Therefore, 1165 will be ejected and given rip
                                                        #Rip points to the next step of callq < sum > in main
                                                        #Instruction, so the main function resumes its execution flow
                                                        #

Now we return tomainFunction continues:

1165: 89 45 F4 mov% eax, - 0xc (% RBP) # we return here from the < sum > function and continue to execute return 0; 1168:    b8 00 00 00 00      mov    $0x0,%eax}    116d:    c9                       leaveq    116e:    c3                       retq    116f:    90                       nop

summary

After the program runs, all function calls will be reflected on a stack calledProgram stackStack for short. The stack is stored in memory and grows from high address to low address (that is, the top of the stack is “down”).

Stack frame is the constituent unit of stack, and its composition is as follows:

What happens after a function is called?

It can be considered that all the information stored on the stack frame is the information of the called function. But a turns to B and B turns to C, so that for C, B is caller and for B, a is caller. The information on the stack frame includes:

  • BP pointer(%ebp)。 This pointer is saved at the beginning of the called function.
  • Save registers and local variables. This is also what the called function is responsible for saving.
  • Input parameters. This is also what the called function is responsible for saving.
  • Return address. This is saved by the main calling functionretqSet to after instruction readingrip

The process of function call is as follows:

1. Main tone: save context

Save your actual parameters, local variables, caller save registers, etc. to the stack.

2. Main tone: execute the callq instruction and jump to execution

Return address%ripSave to the stack, and then jump to the called function execution

3. Transferred: replace the bottom of the stack

The bottom of the main stack%rbpSave it on the stack, and then put the main call on the top of the stack%rspValue as its own stack bottom value.

4. Called: save context

Save your actual parameters, local variables, caller save registers, etc. to the stack.

5. Transferred: execute your own instructions

Execute machine instructions, etc

6. Transferred: restore the bottom of the stack

Restore the main call saved at the top of the stack to the bottom of the stack

7. Called: restore the return address

adoptretq take%ripRemove from stack

8. Keynote: continue to implement

CPU from%ripContinue to execute the command of the master.