Does the program have to start from the main function?

Time:2021-3-7

Does the program have to start from the main function? This article deals with static link related knowledge.

For static links, we first propose two questions

Q: Each target file has many segments. When the target file is linked to an executable file, how can the segments in the input target file be merged into the output file?

A: Merge similar segments, merge all the. Text segments into the. Text segment of the output file, and merge all the. Data segments into the. Data segment of the output file.

Q: How does the linker assign them space and address in the output file?

A: There are two steps involved in program linking

  1. Space and address allocation: scan all input target files, obtain the length attribute and location of each segment, collect all symbol definitions and references in the symbol table of input target files, put them into a global symbol table, merge all segments, calculate the merged length and location of each segment in the output file, and establish the mapping relationship.
  2. Symbol parsing and relocation: use all the information collected in the first step to read the data and relocation information in the middle section of the input file, carry out symbol parsing and relocation, adjust the address in the code, and “patch” the instructions and data that need to be relocated in each section, so that they all point to the correct location.

Tips: external symbols refer to the symbols that need to be referenced in the target file, but they are defined in other target files. Before linking, the address of external symbols is like 000000. After linking, the executable file can see that these external symbols have addresses. Link is to put similar segments together, first find the offset address of the segment, and then find the offset of the symbol in the segment, so as to determine the address of the symbol in the whole executable program.

For those symbols that need to be relocated, they will be placed in the relocation table, also known as relocation segment rel.data 、. rel.text If the. Text segment is repositioned, it is rel.text If the. Data segment has a relocation, it has rel.data Paragraph. You can use objdump to view the relocation table of the target file.

Source code:

int main() {
    Printf ("program meow");
    return 0;
}
gcc -c test
objdump -r test.o

test.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
0000000000000007 R_X86_64_PC32     .rodata-0x0000000000000004
000000000000000c R_X86_64_PLT32    puts-0x0000000000000004


RELOCATION RECORDS FOR [.eh_frame]:
OFFSET           TYPE              VALUE
0000000000000020 R_X86_64_PC32     .text

You can also use nm to view the symbols that need to be relocated

nm -u test.o
                 U _GLOBAL_OFFSET_TABLE_
                 U puts

For und type, the undefined symbols are all because there are relocation items about them in the target file. After the linker scans all the input target files, all the undefined symbols should be found in the global symbol table, otherwise the undefined symbol error will be reported.

Note: our code clearly uses printf, but why does it refer to the symbol of puts? Because the compiler will replace printf with puts by default, which only uses one string parameter, which can save the time of format parsing. Using – fno builtin will turn off the built-in function optimization option, as follows:

~/test$ gcc -c -fno-builtin testlink.cc -o test.o
~/test$ nm test.o
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U printf

Tips: today’s programs and libraries are usually very large. An object file may contain hundreds of functions or variables. When you need to use any function or variable of an object file, you need to link the entire object file. That is to say, those functions that are not used will also be linked in. This will lead to the link output file becoming very large, which will cause the problem of Waste of space.

There is a compilation option called function level linking, which enables a function or variable to be stored in a segment separately. When the linker needs to use a function, it will merge it into the output file. For unused functions, it will discard them to reduce the waste of space, but this will slow down the compilation and linking process. The gcc compiler’s compilation options are as follows:

-ffunction-sections
-fdata-sections

Many people may think that the program starts and ends with the main function, but it is not. Before the main function is called, in order to ensure the smooth progress of the program, it is necessary to initialize the process execution environment, such as heap allocation initialization, thread subsystem, etc. the global object constructor of C + + is also executed during this period, and the global destructor is executed after main .

What is the entrance of Linux general program__ The start function has two sections:

  • . init segment: the initialization code of the process. When a program starts to run, it will run the code in the. Init segment before calling the main function.
  • . fini segment: process termination code. Glibc will arrange to execute this segment of code after main function exits normally.
How to specify program entry

In the process of LD linking, the – e parameter can be used to specify the program entrance. Because a short printf function actually depends on many link libraries, it is not convenient for us to use the link script to link the target file with all these dependent libraries. Therefore, we use the program embedded in the assembly below to print a string, which can be printed without relying on any link libraries If you don’t understand the meaning of the string, you don’t have to worry. You just need to know the following link knowledge.

The code is as follows:

const char* str = "hello";

void print() {
    asm("movl $13,%%edx \n\t"
        "movl str,%%ecx \n\t"
        "movl $0,%%ebx \n\t"
        "movl $4,%%eax \n\t"
        "int $0x80 \n\t"
        :
        :"r"(str):"edx", "ecx", "ebx");
}


void exit() {
    asm("movl $42,%ebx \n\t"
        "movl $1,%eax \n\t"
        "int $0x80 \n\t");
}

void nomain() {
    print();
    exit();
}

Use the following command to generate the target file:

gcc -c -fno-builtin test.cc

Look at the sign of the output test. O:

~/test$ nm -a test.o
0000000000000000 b .bss
0000000000000000 n .comment
0000000000000000 d .data
0000000000000000 d .data.rel.local
0000000000000000 r .eh_frame
0000000000000000 n .note.GNU-stack
0000000000000000 r .rodata
0000000000000000 t .text
0000000000000026 T _Z4exitv
0000000000000000 T _Z5printv
0000000000000039 T _Z6nomainv
0000000000000000 D str
0000000000000000 a test.cc

Here, because my source file ends in. CC, it is compiled in C + +, so the symbol becomes the above form. If it becomes test. C, the symbol is as follows:

~/test$ gcc -c -fno-builtin test.c -o test.o
~/test$ nm -a test.o
0000000000000000 b .bss
0000000000000000 n .comment
0000000000000000 d .data
0000000000000000 d .data.rel.local
0000000000000000 r .eh_frame
0000000000000000 n .note.GNU-stack
0000000000000000 r .rodata
0000000000000000 t .text
0000000000000026 T exit
0000000000000039 T nomain
0000000000000000 T print
0000000000000000 D str
0000000000000000 a test.c

Then use – e to specify the entry function symbol:

~/test$ ld -static -e nomain -o test test.o
~/test$ ./test
hello
How to use custom link script to realize the function of custom segment

In the process of LD linking, the – t parameter can be used to specify the link script, and the default link script can be viewed through LD – verb. The original text is too long. Here is a simple section:

$ ld -verbose
GNU ld (GNU Binutils for Ubuntu) 2.30
  Supported emulations:
   elf_x86_64
   elf32_x86_64
   elf_i386
   elf_iamcu
   i386linux
   elf_l1om
   elf_k1om
   i386pep
   i386pe
using internal linker script:
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2018 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
              "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); SEARCH_DIR("=/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
 
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
  }
  .plt            : { *(.plt) *(.iplt) }
  .plt.got        : { *(.plt.got) }
  .plt.sec        : { *(.plt.sec) }
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
}

Here you customize a simple link script test.lds

ENTRY(nomain)

SECTIONS
{
    . = 0x8048000 + SIZEOF_HEADERS;
    tinytext : { *(.text) *(.data) *(.rodata) }
    /DISCARD/ : { *(.comment) }
}

Then use – t to specify the link script:

~/test$ ld -static -T test.lds -e nomain -o test test.o
~/test$ ./test
hello

The above line of tinytext refers to merging the contents of the. Text segment,. Data segment,. Rodata segment into the tinytext segment, and using readelf to view the segment information.

~/test$ readelf -S test
~/test$ There are 6 section headers, starting at offset 0x482a0:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .eh_frame         PROGBITS         00000000080480b0  000480b0
       0000000000000078  0000000000000000   A       0     0     8
  [ 2] tinytext          PROGBITS         0000000008048128  00048128
       0000000000000066  0000000000000000 WAX       0     0     8
  [ 3] .shstrtab         STRTAB           0000000000000000  0004826e
       000000000000002e  0000000000000000           0     0     1
  [ 4] .symtab           SYMTAB           0000000000000000  00048190
       00000000000000c0  0000000000000018           5     4     8
  [ 5] .strtab           STRTAB           0000000000000000  00048250
       000000000000001e  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Tool tips

About static link libraries:

Ar RCS libxxx. A xx1. O XX2. O package static link library
Ar - t libc. A look at the target files in the static link library
Ar - x libc. A will unzip all the target files to the current directory
GCC -- vbose can view the whole compilation link step

About objdump:

Objdump - I view native target architecture
Objdump - f displays header information
Objdump - D disassembler
Objdump - t shows the symbol table entry and what symbols each target file has
Objdump - R displays the relocation entry and relocation table of the file
Objdump - x displays all available header information, equal to - A - F - H - R - T
Objdump - H help

About analyzing ELF file format:

Readelf - H lists the file headers
Readelf - s lists each segment
Readelf - R lists the relocation table
Readelf - D lists dynamic segments

About viewing target file symbol information:

NM - a shows all the symbols
NM - D display dynamic symbol
NM - U displays only undefined external symbols
NM - defined only displays only defined symbols

Description of symbols:

If the symbol type is lowercase, it indicates that the symbol is local, and uppercase indicates that the symbol is global.

  • A: The value of this symbol is absolute and cannot be changed in the future link process. Such symbolic values often appear in the interrupt vector table. For example, symbols are used to indicate the position of each interrupt vector function in the interrupt vector table.
  • B: The value of this symbol appears in the. BSS section, uninitialized global and static variables.
  • C: The value of this symbol is in the common section, where all the symbols are weak.
  • D: The symbol is in the data segment.
  • 1: The indirect reference of the symbol to another symbol
  • N: Debug symbol
  • R: The symbol is located in the read-only data area
  • T: The symbol is in the code snippet
  • U: The symbol is not defined in the current file and is defined in another file
  • ?: the symbol type is not defined

reference material

https://linuxtools-rst.readth…

Self cultivation of programmers
More articles, please pay attention to my v x Princess number: program meow adult, welcome to exchange.