Start optimized binary rearrangement

Time:2022-1-19

1、 Virtual memory and physical memory

If the process can directly access the physical memory, it is undoubtedly very unsafe. In order to solve the memory security, the current computer and operating system have established a layer of virtual memory on the basis of physical memory.Virtual memory and physical memoryI won’t repeat it here. We mainly find the solution to optimize the app through the principle.

1. Virtual memory

In fact, we usually see the continuous memory space that can be accessed directly in the process0x000000 ~ 0xffffff, it is only a virtual address, and the real physical address can be obtained only after mapping through a mapping table. Not all virtual memory will allocate physical memory, only those actually usedvirtual memoryOnly distributionphysical memoryAnd the allocated physical memory is managed through memory mapping.

Start optimized binary rearrangement

How virtual memory works

2. Virtual memory paging

Just mentioned that virtual memory and physical memory are mapped through the mapping table, but this mapping cannot be one-to-one, which is too wasteful of memory. In order to solve the efficiency problem, the real physical memory is actually paged. The mapping table is also in page units. In other words, the mapping table will only map to a certain page, not to each specific address.

The memory of Mac OS and Linux is 4KB per page, and that of IOS is 16kb per page. have access topagesizeCommand, which can be viewed directly at the terminal. 4096 bytes = 4 kilobytes.

Start optimized binary rearrangement

Start optimized binary rearrangement

  1. 0 and 1 represent whether the current address is in physical memory.
  2. We can also see from the figure above that the virtual address of the process is continuous, but the actual physical memory address is not continuous, but consists of several complete memory pages.
  3. When an application is loaded into physical memory, the entire application will not be loaded into physical memory. Only the part that will be used. that isLazy loadingIn other words, the actual physical memory is allocated as much as the application uses.

2、 Page fault

1. Cause of page fault

  1. When the application accesses an address, the mapping table is0In other words, when it is not loaded into physical memory, the system will immediately block the whole process and trigger aPage missing interrupt, i.ePage Fault
  2. Be aPage missing interruptTriggered, the operating system will re read this page of data from the disk to the physical memory, and then point the virtual memory in the mapping table to the corresponding physical memory. If the current memory is full, the operating system will find a page of data to overwrite through the replacement page algorithm. This is the fundamental reason why no matter how many applications are opened, they will not collapse, but if the previously opened applications are opened again, they will restart.
Start optimized binary rearrangement

2. Page fault impact

Memory paging trigger interrupt exceptionPage FaultAfter, the process will be blocked, which will have an impact on performance. And it is applied in the production environment of IOS systemPage missing interruptWhen reloading, the IOS system will also perform a signature verification, so the IOS production environmentPage FaultIt takes more time.
For users, the first direct experience when using the app is the time to start the app, and there will be a lot of changes during the start-up periodclassclassificationThird partyAnd so on, which need to be loaded and executed. At this time, a large number ofPage FaultThe resulting time-consuming often cannot be underestimated.

Tiktok team shared onePage Fault, the cost is0.6 ~ 0.8ms。 The actual test shows that different pages will be different, which is also related to the CPU load state0.1 ~ 1.0 msbetween.
Binary rearrangementThis plan was also the earliestTiktok teamShared.

3、 Binary rearrangement

1. Binary rearrangement principle

Function compiled inmach-OThe position in is based onLD (linker for Xcode)The compilation order of is not the calling order, so it is likely that the two functions are distributed on different memory pages.

Start optimized binary rearrangement

  1. As shown in the figure above, the compilation order ismethod1method2… 。 At startuppage1Andpage2Both need to be loaded into physical memory from scratch, so it will be triggered twicePage Fault
  2. Binary rearrangementThe best way is tomethod1Andmethod4Put it into a memory page, then you only need to load the page once at startup, that is, only trigger it oncePage Fault
  3. In a real project, we can put together the functions that need to be called at startup (for example, in the first 10 pages) to reduce the number of functions as much as possiblePage Fault, which reduces startup time.

2. Binary rearrangement operation

Apple has provided us with this mechanism, in factBinary rearrangement is to rearrange the executable files to be generated. This operation occurs in the link phase.

2.1 Order File

Start optimized binary rearrangement

The linker used by Xcode is calledldldThere is a parameter calledOrder File, we can configure one through this parameterSuffixbyorderThe path to the file. In thisxxx.orderIn the document, the required symbols are written in order when the projectbuildXcode will read the file and the binary packages will be generated according to the symbol order in the filemach-O

2.2 linkmap View binary file layout

LinkmapIt is an intermediate product of IOS compilation process. It recordsLayout of binary files, the opening steps are as follows:

2.2.1 modificationWrite Link Map FileYes, then clean the project and recompile

Start optimized binary rearrangement

  • Products -> show in finder, go to the upper folder and find onexxx-LinkMap-normal-arm64.txtTxt file for

Start optimized binary rearrangement

  • Of this file# Symbols:The section stores the order of all symbols, the preceding one O and other contents are ignored,AddressIs the actual physical address, availableMach-o toolsee

Start optimized binary rearrangement

  • We found that the order of symbols isCompile SourcesFile order

When we adjustCompile SourcesAfter the file order in, you will find that the symbol order has also changed.

Start optimized binary rearrangement

Start optimized binary rearrangement

2.3 binary rearrangement principle

Our binary rearrangement is not just to modify the symbol address, but to rearrange the offset address of the whole code in the file by using the symbol order, and put the method address to be loaded for startup into the front memory page, so as to reduce thepage faultSo as to achieve time optimization.

3. Get all the methods called during app startup (using compilation instrumentation)

Note: clang instrumentation is actually a code coverage toolClang’s official website address

To truly realize binary rearrangement, we need to get all the symbols of methods and functions at startup, save their order, and then write themxxx.orderFile to achieve binary rearrangement, and the obtained scheme is usedClang compilation and pile insertion

3.1 inBuild SettingsinOther C FlagsAdd compilation configuration-fsanitize-coverage=func,trace-pc-guard

Start optimized binary rearrangement

3.2 after adding the compilation configuration, you will find compilation errors, as follows:

Start optimized binary rearrangement

3.3 adding clang function

#import "DZHomeViewController.h"
#import <dlfcn. h> // Explicit Call of dynamic library
#import <libkern/OSAtomic.h> //

/*
 Considering that the pile insertion method will be called many times and the use of locks will affect the performance, the "atomic queue" at the bottom of apple is actually a linked list, which follows the first in first out principle
 **/
static OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT;
//Define symbol structure
typedef struct {
    void *pc;
    void *next;
} PCNode;
@interface DZHomeViewController ()

@end

@implementation DZHomeViewController

void(^blockTest)(void) = ^(void) {
    
};

+ (void)load {
    
}

+ (void)initialize {
    
}

- (void)viewDidLoad {
    [super viewDidLoad];    
}

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event {
  //    [self deziTest];
    NSMutableArray <NSString *> * symbolNames = [NSMutableArray array];

    while (YES) {
        PCNode *node = OSAtomicDequeue(&symbolList, offsetof(PCNode, next));
        if (node == NULL) {
            break;
        }
        Dl_info info;
        dladdr(node->pc, &info);
        NSString * name = @(info.dli_sname);
        BOOL  isObjc = [name hasPrefix:@"+["] || [name hasPrefix:@"-["];
        NSString * symbolName = isObjc ? name: [@"_" stringByAppendingString:name];
        [symbolNames addObject:symbolName];
    }
    //Reverse
    NSEnumerator *emt = [symbolNames reverseObjectEnumerator];
    //Weight removal
    NSMutableArray<NSString *> *funcs = [NSMutableArray arrayWithCapacity:symbolNames.count];
    NSString *name;
    while (name = [emt nextObject]) {
        if (![funcs containsObject:name]) {
            [funcs addObject:name];
        }
    }
    //Kill yourself!
    [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];
    //Change array to string
    NSString *funcStr = [funcs  componentsJoinedByString:@"\n"];

    NSString *filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"fontResources.order"];
    NSData *fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
    [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    NSLog(@"%@",funcStr);
}

- (void)deziTest {
    blockTest();
}

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    
    /*Pinpoint where to start and where to end! Make judgment and write conditions here*/
    void *PC = __builtin_return_address(0);
    DeziNode *node = malloc(sizeof(DeziNode));
    *node = (DeziNode){PC,NULL};
    //Enter
    OSAtomicEnqueue(&symbolList, node, offsetof(DeziNode, next));
    
    Dl_ info info; //  When dynamically linking libraries, a DL is referenced by passing a pointer to the mach-o header_ Info structure
    dladdr(PC, &info);

    
    printf("----------------------------------------\nfname:%s \nfbase:%p \nsname:%s \nsaddr:%p\n",
           info.dli_fname,
           info.dli_fbase,
           info.dli_sname,
           info.dli_saddr);

}

@end
  • dl_ Info structure
typedef struct dl_info {
    const char      *dli_ fname;     /*  Pathname of the shared object*/
    void            *dli_ fbase;     /*  Base address of shared object*/
    const char      *dli_ sname;     /*  The name of the nearest symbol*/
    void            *dli_ saddr;     /*  Nearest symbol address*/
} Dl_info;

3.4 assembly breakpoint debugging

  • First, open assembly debugging

Start optimized binary rearrangement

  • Add breakpoints to methods

Start optimized binary rearrangement

Start optimized binary rearrangement

Start optimized binary rearrangement

  • Commissioning results

Start optimized binary rearrangement

Start optimized binary rearrangement

Start optimized binary rearrangement

  • conclusion

  • From assembly breakpoint debugging, you can find that this method is inserted into all method functions__sanitizer_cov_trace_pc_guardTherefore, each time the method is executed, the pile insertion method is executed first.
  • Therefore, at the compilation time, the clang plug-in will statically add assembly instructions to achieve all methods of global AOP and hook.

3.5 use__sanitizer_cov_trace_pc_guard

  • Breakpoint print discoveryPCIs the method address

void *PC = __builtin_return_address(0);Get the current function through this function__sanitizer_cov_trace_pc_guardThe next function address of the program, that is, the real calling method in the program.

Start optimized binary rearrangement

3.6 access method through atomic queue

  • Pile insertion time storage

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    /*Locate the next method of the pile insertion method, that is, the real calling method in the program*/
    void *PC = __builtin_return_address(0);
    
    PCNode *node = malloc(sizeof(PCNode));
    *node = (PCNode){PC,NULL};
    //Enter the & symbollist linked list header, node data, offsetof (PCNode, next) the offset address of the next member in the linked list
    OSAtomicEnqueue(&symbolList, node, offsetof(PCNode, next));
}
  • Manually fetch the method stored in the atomic queue through the touchesbegan method

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event {
    NSMutableArray <NSString *> *symbolNames = [NSMutableArray array];

    while (YES) {
        //& symbollist linked list header,
        PCNode *node = OSAtomicDequeue(&symbolList, offsetof(PCNode, next));
        if (node == NULL) {
            break;
        }
        Dl_info info;
        dladdr(node->pc, &info);
        NSString * name = @(info.dli_sname);
        BOOL isObjc = [name hasPrefix:@"+["] || [name hasPrefix:@"-["];
        NSString *symbolName = isObjc ? name: [@"_" stringByAppendingString:name];
        [symbolNames addObject:symbolName];
    }
    
    //Because of the first in first out feature, it is necessary to take the negative value
    NSEnumerator *emt = [symbolNames reverseObjectEnumerator];
    //Weight removal
    NSMutableArray<NSString *> *funcs = [NSMutableArray arrayWithCapacity:symbolNames.count];
    NSString *name;
    while (name = [emt nextObject]) {
        if (![funcs containsObject:name]) {
            [funcs addObject:name];
        }
    }
    //Get rid of yourself
    [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];
    //Change array to string
    NSString *funcStr = [funcs  componentsJoinedByString:@"\n"];

    NSString *filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"fontResources.order"];
    NSData *fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
    [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    NSLog(@"%@",funcStr);
}
  • Save to localfontResources.orderTake out the documents and put them in the project

Start optimized binary rearrangement

  • The order file file of the configuration project

Start optimized binary rearrangement

  • This is the end of binary rearrangement. Compare before and afterxxx-LinkMap-normal-arm64.txtFile, we will find that the methods called at startup have been ranked first

Start optimized binary rearrangement

Before binary rearrangement
Start optimized binary rearrangement

After binary rearrangement

4、 UseSystem TraceTo verify the binary rearrangement results

1. How to measure the page loading time? The system trace tool in instruments is used here.

First, restart the device (cold start). ⌘ + I open instruments and select system trace tool.
Click record ⏺ and the first page will appear. Stop ⏹. Only the main thread is displayed for filtering. Select summary: virtual memory.

  • The number of times file backed page in is the number of times page fault is triggered.
  • Page cache hit is the number of page cache hits.
Start optimized binary rearrangement

There are many factors affecting the acquisition of page fault, resulting in large fluctuations in each acquisition. The data can only be roughly estimated by taking the average value of multiple samples under the same environment as far as possible. It will not be repeated here.