When our application is very large, opening our app feels very stuck and starts slowly, which greatly affects the user experience. So how to make our app start more smoothly and give users a good experience? This article will bring you knowledge about app start optimization.
1. App startup process analysis
The launch of app is generally divided into two parts:The main function is preceded by pre mainandAfter main function
1.1 pre main stage process
Let’s monitor it through dyldpre-mainWe set a parameter in Xcode, as shown in the figure
Let’s start the app and see the output results, as shown in the figure below
At this time, thepre-mainThe time-consuming stage process is an empty project, which is running in the simulator, so the time is not accurate. A real project is about 400ms.
- Dylib loading time of loading dynamic library
- Rebase / binding redirect / bind time
- Registration time of objc setup OC class
- Initializer registration method time (load, constructor time)
This ispre-mainBasic process of
1.2 dynamic loading of dylib
The loading time of dylib is inevitable. The dynamic library of the system has been loaded into the shared cache space. The dynamic library of the system has been optimized at a high speed, but our customized dynamic library is different, so Apple suggestsDo not have more than 6 dynamic libraries. If more than 6, try to merge them.
1.3 ObjC setup
Because our OC is a dynamic language, the registration of OC class
- Read the data field of mach-o and find the relevant information of OC class
- When registering OC classes, the OC runtime needs to maintain the mapping table, that is, the mapping of sel / IMP and the global table of class names and classes. When loading mach-o, all these classes must be registered in the global table. In addition, there are classes and protocol information to be inserted into the method list, which is a natural loss. Therefore, the optimization hereReduce the definition of OC classes and delete useless OC class files (as long as this class exists, even if it is not used, it will cause time loss)
In the load method and constructor, try not to delay loading, and put the consumed tasks in the sub thread to reduce the overhead of the main thread, and the data can be cached.
The optimization of the above points is relatively simple. Let’s introduce rebase / binding redirection / binding. Before introducing it, let’s talk about the knowledge related to virtual memory
2 Introduction to virtual memory
2.1 concept of virtual address
When the operating system loads the application in the early stage, it directly loads the application into the physical memory. At this time, the address of the application is the real address in the physical memory module. What’s the problem with doing so?
- Insufficient memory: applications are directly loaded into physical memory. If many applications are loaded, it will be reported that there is insufficient memory. At this time, the previous applications can be accessed only by killing them
- Unsafe, reason: for example, the game plug-in can directly access the physical memory and locate the relevant memory of the code, which can be modified directly.
In order to solve the problem of insufficient memory, virtual memory is needed at this time.
For applications loaded into memory, users generally do not use all the functions of the application, which also shows that for applications fully loaded into memory, a piece of memory may not be used, which leads to a waste of memory.
In order to solve these problems, we use lazy loading to divide the application into pieces. When we start the application, the code to be loaded at startup is loaded into memory. When new functions are used, we load this memory, which is lazy loading.
But there is a problem at this time, which will cause the discontinuity of our code, and the program access will become very complex. We have to recalculate the address every time.
The emergence of virtual memory table solves this problem by storing the mapping relationship between application and real physical memory.
This figure well illustrates the mapping relationship between virtual memory table and physical memory.
In order to improve efficiency and performance, page loading appears at this time. At present, the size of a page in IOS is16k, yes on MAC (PageSize command)4k。
At this time, we can solve the problem of insufficient memory and solve the problem of relative security, because at this time, the game plug-in can not directly access the physical memory, but can only access the virtual memory, and then access the physical memory through MMU translation. At this time, the game plug-in can only access its own process memory space, and the security isolation between processes.
2.2 principle of memory paging
Let’s analyze how the application is loaded into memory.
The virtual memory table is 4G in size
Let’s look at the picture below
In this picture:
- In the virtual page table of process 1, P1, P3 and P5 are loaded into memory when they are started. When the user needs P2 data during operation, it is found that P2 is not loaded into memory,The operating system will issue a page missing exception (page missing interrupt)At this time, the code to be executed by the CPU will be interrupted, and the operating system will load the P2 data into the physical memory, and insert it where there is a free location. Generally speaking, there is basically no free location for a period of time after the mobile phone is started,The operating system will overwrite the inactive memory through the page replacement algorithm
- Pagezero, when we access a code space larger or smaller than our code space, will point to null. It is a small mark in the real physical memory to isolate processes.
- The maximum address we can input is 8g, which cannot exceed 8g. Our memory represents an address with 8 bytes.
- The maximum access we can access is 8g, that is, from 0x0000010000000 (4G), but the previous 4G cannot be accessed. It is to isolate 32 bits. 64 bit programs should be compatible with 32 bits. In order to distinguish 64 bits from 32 bits, 64 bits are accessed from 1.
- In order to provide legal communication between processes, the system provides a special interface and sends signals through the kernel.
3. Pagefault debugging & startup optimization principle
3.1 32-bit and 64 bit concepts of CPU
The 32-bit and 64 bit of the CPU refer to a component on the CPU, which is called the data bus.
There are many pins on the CPU. The mainboard has rows of wires. Each wire is composed of only 1 and 0 states. One communication of 8 wires means one byte, and 32 bits is 4 bytes. In the 32-bit system, the address is within 4G, which is the data bus.
32-bit and 64 bit refer to the throughput of the CPU, and how much data can be read or written in one discharge.
In 64 bits, a memory address occupies 8 bytes. In object-oriented languages, the transfer of objects is also 8 bytes (pointers), which is the most efficient.
3.2 binding / binding
Why is there a binding process?
When internal files need to access external functions, we access them through internal symbol binding, so the binding time is inevitable.
To reduce this part of the time, you can only reduce the access to external functions, but the binding here is lazy loading, so reducing this part of the time will not have any effect
3.3 rebase / redirect
After the emergence of our virtual memory, the virtual memory starts from 0, so that it can be accessed as long as the offset address is calculated, resulting in relative insecurity. In order to solve this problem, ASLR technology is introduced, so that the virtual memory table generated each time starts from a random value, which is different every time. In this way, the starting address is different every time the application is started, so it can not be accessed directly by calculating the file offset address.
However, with the emergence of ASLR, internal files can only be accessed after calculating the offset through this ASLR.
After compiling, our code has determined the address in mach-o, as shown in the figure
The offset here is the offset of the code in the file. It is fixed. Due to ASLR technology, it is ASLR + offset to find the corresponding function and method every time it is executed. This process is rebase / redirection.
3.4 binary rearrangement
Binary rearrangement can optimize our startup time. Why? Let’s analyze it.
As we said aboveWhen our code accesses page data that has not been loaded into memory, pagefault will occur, that is, page missing exception / page missing interrupt.
A pagefault occurs in milliseconds. If there are many pagefaults at the same time, the total time will be very long.
When will there be a large number of page missing exceptions?
The answer is definitely at the time of start-up, which is exactly called cold start
Let’s debug pagefault first. Let’s measure the startup time and pagefault data, as shown in the figure
thereFile Backed Page inPagefault takes 1.25 seconds, and the total consumption is 1.27 seconds. Pagefault takes a lot of time.
How to optimize thisPageFaultTime of day.
Let’s search build settingWrite Link Map File, as shown below
After we compile, open the app file we compiled, as shown in the figure
Open demo-linkmap-normal-x86_ 64.txt file, as shown below
# Symbols: # Address Size File Name 0x100001E90 0x00000030 [ 2] +[AppDelegate load] 0x100001EC0 0x00000080 [ 2] -[AppDelegate application:didFinishLaunchingWithOptions:] 0x100001F40 0x00000120 [ 2] -[AppDelegate application:configurationForConnectingSceneSession:options:] 0x100002060 0x00000070 [ 2] -[AppDelegate application:didDiscardSceneSessions:] 0x1000020D0 0x00000030 [ 3] +[ViewController load] 0x100002100 0x00000039 [ 3] -[ViewController viewDidLoad] 0x100002140 0x0000008E [ 4] _main 0x1000021D0 0x000000B0 [ 5] -[SceneDelegate scene:willConnectToSession:options:] 0x100002280 0x00000040 [ 5] -[SceneDelegate sceneDidDisconnect:] 0x1000022C0 0x00000040 [ 5] -[SceneDelegate sceneDidBecomeActive:] 0x100002300 0x00000040 [ 5] -[SceneDelegate sceneWillResignActive:] 0x100002340 0x00000040 [ 5] -[SceneDelegate sceneWillEnterForeground:] 0x100002380 0x00000040 [ 5] -[SceneDelegate sceneDidEnterBackground:] 0x1000023C0 0x00000020 [ 5] -[SceneDelegate window] 0x1000023E0 0x00000040 [ 5] -[SceneDelegate setWindow:]
Here is the order of all method code implementations.
The address + ASLR here is the address of the method in the virtual memory.
Address is equivalent to the order of page and the compilation order of name method
Let’s adjust the file order, as shown in the figure
# Symbols: # Address Size File Name 0x100001E90 0x00000030 [ 2] +[ViewController load] 0x100001EC0 0x00000039 [ 2] -[ViewController viewDidLoad] 0x100001F00 0x0000008E [ 3] _main 0x100001F90 0x00000030 [ 4] +[AppDelegate load] 0x100001FC0 0x00000080 [ 4] -[AppDelegate application:didFinishLaunchingWithOptions:] 0x100002040 0x00000120 [ 4] -[AppDelegate application:configurationForConnectingSceneSession:options:] 0x100002160 0x00000070 [ 4] -[AppDelegate application:didDiscardSceneSessions:] 0x1000021D0 0x000000B0 [ 5] -[SceneDelegate scene:willConnectToSession:options:] 0x100002280 0x00000040 [ 5] -[SceneDelegate sceneDidDisconnect:] 0x1000022C0 0x00000040 [ 5] -[SceneDelegate sceneDidBecomeActive:] 0x100002300 0x00000040 [ 5] -[SceneDelegate sceneWillResignActive:] 0x100002340 0x00000040 [ 5] -[SceneDelegate sceneWillEnterForeground:] 0x100002380 0x00000040 [ 5] -[SceneDelegate sceneDidEnterBackground:] 0x1000023C0 0x00000020 [ 5] -[SceneDelegate window]
At this time, we found that the order had changed, which confirmed what we just said.
How should we optimize here?
When our application is loaded, many pages are loaded. If only one method of a page is called, this page also needs to be loaded into memory, which will waste memory. At this time, we canThe methods to be loaded at the start of the program are arranged at the top, which greatly reduces the number of pagefaults, which requires binary rearrangement technology.
3.5 use of binary rearrangement
In the objc source file, we found a libobjc Open the order file as follows:
__objc_init _environ_init _tls_init _lock_init _recursive_mutex_init _exception_init _map_images _map_images_nolock __getObjcImageInfo __hasObjcContents __objc_appendHeader
All the symbolic names shown here are for the compiler. When the compiler reads the Oder file, the binary will be sorted in the order here.
Let’s create a new order file, put it under the root, and then edit it, as shown below
-[SceneDelegate sceneWillResignActive:] -[SceneDelegate sceneWillEnterForeground:] -[SceneDelegate sceneDidEnterBackground:] -[SceneDelegate window] _main +[ViewController load] +[AppDelegate load]
We areBuild Settings, searchOrder File
Configure the file path of our custom order.
After compiling, let’s take a look at link map Txt file, as shown below
# Address Size File Name 0x100001E90 0x00000040 [ 5] -[SceneDelegate sceneWillResignActive:] 0x100001ED0 0x00000040 [ 5] -[SceneDelegate sceneWillEnterForeground:] 0x100001F10 0x00000040 [ 5] -[SceneDelegate sceneDidEnterBackground:] 0x100001F50 0x00000020 [ 5] -[SceneDelegate window] 0x100001F70 0x0000008E [ 2] _main 0x100002000 0x00000030 [ 3] +[ViewController load] 0x100002030 0x00000030 [ 4] +[AppDelegate load] 0x100002060 0x00000039 [ 3] -[ViewController viewDidLoad]
This order is arranged according to the order we edit ourselves. If the symbols do not exist, they will be removed.
But here’s a problem
What we do is start-up optimization. We need to know the methods to be called for start-up. There are many methods and methods are nested. At this time, it is very complicated to write this order manually. If we use hook objc_ The msgsend method can intercept the OC method, but the C function and block cannot be dropped through this hook. Here is a hint. We will solve these problems in subsequent articles.
This article introduces the process of APP startup, virtual address, virtual memory, the principle of pagefault and the principle of binary rearrangement. This article not only consolidates a lot of knowledge, but also hopes to make you gain something.