In depth exploration of IOS startup speed optimization

Time:2020-10-1

 

 

introduce

The startup time of an app is an important indicator to reflect its performance. The faster the startup time is, the shorter the waiting time of users will be. In order to improve the user experience, large factory applications will even achieve “millisecond must be investigated”.

We divide app startup methods into:

name explain
cold boot When the app starts, the application process is not in the system (opened for the first time or the program is killed), so the system needs to allocate a new process to start the application.
Hot start After the app is returned to the background, the corresponding process is still in the system. If it is started, the application will be returned to the foreground for display.

This paper mainly focuses on the optimization analysis of cold start mode, introduces the commonly used detection tools and optimization methods.

Cold start process

Apple’s official “WWDC optimizing app startup time” divides the startup of IOS applications into pre main stage and main stage. The best startup speed is within 400ms, and the slowest is no more than 20s. Otherwise, it will be killed by the system process (minimum configuration device).

In order to better distinguish, the author divides the whole startup process into three stages: pre main + main function agent (didfinish launching with options) + viewdidappearanceMain functionImplementation phase.

Pre main execution content

At this time, the corresponding app page is the display of the flash screen page.

  • Load executable

    LoadingMach-OFormat file, that is, the format generated by all classes in the app is.oThe collection of target files for.

  • Load dynamic library

    dyldLoadingdylibThe following steps will be completed:

    1. Analyze all dylibs that app depends on.
    2. Find the mach-o file corresponding to dylib.
    3. Open, read, and verify the validity of these mach-o files.
    4. Register code signatures in the system kernel.
    5. Call mmap() for each segment of dylib.

    The dynamic library that the system depends on can be loaded quickly because it has been optimized, while the dynamic library introduced by the developer needs a long time.

  • Rebase and bind operations

    Due to the use ofASLRTechnology, indylibDuring the loading process, it is necessary to calculate the pointer offset to get the correct resource address.  RebaseRead the image into memory, correct the pointer inside the image, and consumeIOPerformance;BindQuerying symbol tables and binding external images requires a large number ofCPUCalculation.

  • Objc setup

    Carry outObjc, including registrationObjcClass, detectionselectorUniqueness, insertion classification method, etc.

  • Initializers

    Write content to the application stack, including execution+loadMethod, callC/C++The constructor function in theattribute((constructor))To create a non basic type ofC++Static global variables, etc.

Main function agent execution content

Frommain()Function starts executing todidFinishLaunchingWithOptionsTime spent at the end of method execution. In this process, various tools (monitoring tools, push, positioning, etc.) are usually used for initialization, permission application, version judgment, global configuration, etc.

First screen rendering content

First screenUIConstruction phase, needCPUCalculate the layout and use theGPUIf the data comes from the network, the network request is needed.

Optimization scheme

Pre main stage

test method

It is relatively simple to obtain the time-consuming before the main() method is executed, which can be achieved through the measurement method provided by Xcode. Change the environment variables from product, scheme, edit scheme, run, environment variables in XcodeDYLD_PRINT_STATISTICSOrDYLD_PRINT_STATISTICS_DETAILSSet to1You can get the execution time of each item:

// example 
// DYLD_PRINT_STATISTICS
Total pre-main time: 383.50 milliseconds (100.0%)
         dylib loading time: 254.02 milliseconds (66.2%)
        rebase/binding time:  20.88 milliseconds (5.4%)
            ObjC setup time:  29.33 milliseconds (7.6%)
           initializer time:  79.15 milliseconds (20.6%)
           slowest intializers :
             libSystem.B.dylib :   8.06 milliseconds (2.1%)
    libMainThreadChecker.dylib :  22.19 milliseconds (5.7%)
                  AFNetworking :  11.66 milliseconds (3.0%)
                  TestDemo :  38.19 milliseconds (9.9%)

// DYLD_PRINT_STATISTICS_DETAILS
  total time: 614.71 milliseconds (100.0%)
  total images loaded:  401 (380 from dyld shared cache)
  total segments mapped: 77, into 1785 pages with 252 pages pre-fetched
  total images loading time: 337.21 milliseconds (54.8%)
  total load time in ObjC:  12.81 milliseconds (2.0%)
  total debugger pause time: 307.99 milliseconds (50.1%)
  total dtrace DOF registration time:   0.07 milliseconds (0.0%)
  total rebase fixups:  152,438
  total rebase fixups time:   2.23 milliseconds (0.3%)
  total binding fixups: 496,288
  total binding fixups time: 218.03 milliseconds (35.4%)
  total weak binding fixups time:   0.75 milliseconds (0.1%)
  total redo shared cached bindings time: 221.37 milliseconds (36.0%)
  total bindings lazily fixed up: 0 of 0
  total time in initializers and ObjC +load:  43.56 milliseconds (7.0%)
                         libSystem.B.dylib :   3.67 milliseconds (0.5%)
               libBacktraceRecording.dylib :   3.41 milliseconds (0.5%)
                libMainThreadChecker.dylib :  21.19 milliseconds (3.4%)
                              AFNetworking :  10.89 milliseconds (1.7%)
                              TestDemo :   2.37 milliseconds (0.3%)
total symbol trie searches:    1267474
total symbol table binary searches:    0
total images defining weak symbols:  34
total images using weak symbols:  97

Optimization point

  • Merge dynamic libraries and reduce usageEmbedded FrameworkIn other words, the dynamic framework is not created by the system. If the requirement of package volume is not strict, static library can be used instead.

  • Remove useless code (unused static variables, classes, methods, etc.) and extract duplicate code.

  • Avoid+loadExecution method, using+initializeInstead.

  • Avoid useattribute((constructor)), you can put the content to be implemented in the initialization methoddispatch_onceUse.

  • Reduce the number of C + + static global variables of non basic types. (because this kind of global variable is usually a class or structure, if there is heavy work in the constructor, it will slow down the startup speed.)

Main function proxy stage

test method

  • Manual code insertion calculation time consuming

    Inman()Function starts when it starts to execute

    Cfabsolute time starttime; // record global variables
    int main(int argc, char * argv[]) {
        @autoreleasepool {
            StartTime = CFAbsoluteTimeGetCurrent();
            return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
        }
    }

    AgaindidFinishLaunchingWithOptionsGet the end time before returning. The difference between the two is the time consumption of this stage

    External cfabsolute time starttime; // declare global variables
    - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    
        //...
        //...
    
        double launchTime = (CFAbsoluteTimeGetCurrent() - startTime);
        return YES;
    }

    This manual embedding method can also be used to obtain the buried points of each function, but when there are many functions, it also needs a lot of work, and the code needs to be removed for subsequent online, which is not reusable.

  • Time Profiler

    XcodeThe principle of the self-contained tool is to capture the stack information of the thread regularly, and calculate the approximate time consumption of each method in a period of time by comparing the stack states between time intervals. The accuracy depends on the set timing interval.

    Open the tool through Xcode → open developer tool → instruments → time profiler. Note that the debug value of debug information format in the project needs to be changed to dwarf with dsym file, otherwise you can only see a bunch of threads and cannot locate the function.

 

You can jump to the corresponding code by double clicking the specific function. In addition, the call tree’sSeperate by ThreadAndHide System LibrariesCheck it for easy viewing.

 

The normal time profiler samples every 1ms. By default, it only collects the call stacks of all running threads, and finally summarizes them in a statistical way. Therefore, it is not possible to count the functions that take too short and the threads that are dormant. For example, in the five samples shown in the figure below, method3 is not sampled, so method3 cannot be seen in the stack finally aggregated.

 

We can adjust the configuration in File > recording options to get a more accurate call stack.

 
  • System Trace

    Sometimes when the main thread is blocked by other threads, it cannot pass throughTime ProfilerAt first glance, we can still use itSystem TraceFor example, we deliberatelydyldAfter linking the dynamic library, the callback will sleep for 10ms

    static void add(const struct mach_header* header, intptr_t imp) {
        usleep(10000);
    }
    - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
    {
        dispatch_sync(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
            _dyld_register_func_for_add_image(add);
        });
      ....
    }
 

It can be seen that the whole recording process takes 7S, butTime ProfilerOnly 1.17s is shown on the screen, and it is blank for a period of time after startup. At this point throughSystem TraceView the specific status of each thread.

 

You can see that the main thread has been blocked for a period of time, and there is a mutex. Switch toEvents:Thread StateObserve the blocked next instruction and find out0x5d39cThe main thread does not execute until the lock is released.

 

Then we observe0x5d39cThread. It is found that during this period of time when the main thread is blocked, the thread has executed a number of 10ms ofsleepThe main thread is blocked by the child thread, which causes the slow start.

 

In the future, when we want to see more clearly the scheduling between threads, we can use itSystem Trace, but it is recommended to use it firstTime Profiler, easy to understand and more efficient in troubleshooting.

  • App Launch

    The new tool after xcode11 has the function of integrating time profiler and system trace.

  • Hook objc_msgSend

    It can be used for objc_ MsgSend takes Hook to get the specific time consumption of each function, optimizes the time-consuming function in the startup stage or calls it back. The implementation method can be checked through objc_ Msgsend implements IOS method time-consuming monitoring.

Optimization point

  • The time-consuming functions are found through the detection tools, and the functions with low priority are delayed.
  • Sort out the business logic and delay the execution of the logic that can be delayed. For example, check the new version, register push notification and other logic.
  • Sort out the binary / tripartite libraries, find the libraries that can be delayed loading, and do the delayed loading processing, such as after the viewdidappearance method of the home page controller.

First screen rendering phase

test method

Record first screenviewDidLoadStart time andviewDidAppearStart time, the difference between the two is the total rendering time of the first screen. If you want to get the specific time consumption of each step, you can use it with the main function proxy phaseTime ProfilerOrHook objc_msgSend

Optimization point

  • Using a simple ad page as a transition, the calculation operation of the home page and the network request are carried out asynchronously when the advertisement page is displayed.
  • When the activities need to change the page display (such as double 11), the data cache should be issued in advance.
  • The home controller is built in pure code, not inxib/StoryboardTo avoid time-consuming layout conversion.
  • Avoid a large amount of calculation in the main thread, and put the calculation content irrelevant to the first screen after the page display, so as to shorten the calculationCPUCalculate the time.
  • Avoid the use of large images, reduce the number of views and levels, reduceGPUThe burden.
  • Do a good job in network request interface optimization (DNS policy, etc.) and only request data related to the first screen.
  • Cache the first screen data locally, and then request new data after rendering.

Other optimizations

Binary rearrangement

At the end of last year, the concept of binary rearrangement was caught on fire by the universe factory. I think the gimmick is greater than the effect. For details, please refer to the article

summary

Start up optimization should not be one-time, and the best solution should not be solved until it appears, but should include:

  • Solve existing problems
  • Control of subsequent development
  • Complete monitoring system

Only when we intervene in the process of development at the same time can we ensure the quality of the app. After all, development is the process of digging holes for future generations.

Some tools

  • Xcode comes with tools time profiler and system trace

  • Add app launch after xcode11

  • Static Initializer Tracing

  • Inspection code of appcode scans useless code

  • Fui scan useless classes

  • Tinypng compresses images to reduce IO operation

reference material

  • How to optimize the cold start of IOS app

  • IOS startup optimization

  • How to optimize and monitor app startup speed?

  • How to accurately measure the startup time of IOS app

  • App Startup Time: Past, Present, and Future

  • Instruments Tutorial with Swift: Getting Started

  • Cold start governance of IOS app: from the practice of meituan takeout

  • System trace for performance in depth analysis

  • IOS app startup performance optimization

  • Ma cellular IOS app startup Governance: return to user experience

recommend:

  • 020 continuous update, boutique small circle every day has new content, dry goods concentration is very high.

  • Solid contacts, discuss technology, you want here!

  • Join the group first and win the peers! (there is no charge for joining the group)

  • (direct search group number: 789143298, fast access)
  • Click here to exchange and learn with IOS developer Daniel

Application sent to:

  • Bat, exclusive interview kit,

  • Materials are available free of charge, including data structure, low-level advanced, graphic vision, audio and video, architecture design, reverse security, rxswift, and router,

     

Author: Simon Ye
Link: https://juejin.im/post/5e950106f265da47b725eaff

Recommended Today

Understand mybatis step by step through the project

Reprint please be sure to indicate the source, original is not easy! Related articles:Understand mybatis < 1 > step by step through the project All code address of the project:Github-Mybatis Mybatis solves the problem of JDBC programming 1. The frequent creation and release of database links results in the waste of system resources, which affects […]