Heart meets Android startup optimization practice: reduce startup time by 50%

Time:2022-11-22

Image from:https://unsplash.com/photos/_…
Author of this article: ZZG

foreword

As an important part of APP experience, startup speed is the focus of each technical team. The increase or decrease in the startup time of a few hundred milliseconds will affect the user experience and directly reflect on retention. Xinyu APP, as an application designed to meet the social demands of young and middle-aged market users, requires a good start-up experience for mobile phone models of all performance levels. Therefore, with the rapid growth of the number of users, startup optimization has been put on the agenda as a special performance item.

Startup optimization, as the name suggests, is to optimize the duration of the process from when the user clicks the icon to when the homepage is fully visible. In order to better measure the start-up time, we divide it into two parts: the start-up phase and the first homepage refresh. The start-up phase is from clicking the icon to displaying the first frame of the homepage. The first brushing stage of the homepage is to record the time from when the first frame of the homepage is visible to when the homepage is fully visible.

After 5 months of optimization practice, the average startup time of Xinyu OnlineFrom more than 8 seconds to about 4 seconds, the start-up time has been reduced by more than 50%, of which the start-up phase has been reduced by 3.7 seconds, and the first frame and first refresh time has been reduced by 0.4 seconds. Startup optimization, an important part of the performance optimization project, has successfully achieved the expected baseline goals.

optimization practice

This article will introduce the work done by the Xinyu team on startup optimization, as well as some insights gained in the optimization practice.

The application has three startup states: cold startup, warm startup and warm startup. This paper mainly focuses on the time consumption of cold start. First of all, we need to understand, what are the steps to start optimization optimization:

Heart meets Android startup optimization practice: reduce startup time by 50%

At the beginning of a cold start, the system process first performs a series of operations, and finally creates an application process, and then the application process performs tasks such as starting the main thread and creating pages. There are actually many points involved in this process, but based on the goal of reducing the time from startup to displaying the main link on the home page, we can focus our work on three stages: Application creation, main thread tasks, and Activity page rendering. In the subsequent optimization, we also focus on optimizing the time-consuming points of these three stages.

In order to better explain the implementation and benefits of each optimization measure, we take the oppo A5 mobile phone as an example.

This is the time-consuming of each stage of the app before optimization on the oppo A5.

Heart meets Android startup optimization practice: reduce startup time by 50%

From clicking the icon to the fully interactive homepage, this stage took 19 seconds. Oppo A5, as a mobile phone with poor performance, will certainly have a certain impact on the startup time, but various unreasonable logic and code implementations during the App startup process are the main reasons for the entire lengthy startup process. After a series of optimization work, the time consumption of each startup process of the app is as follows:

Heart meets Android startup optimization practice: reduce startup time by 50%

The entire startup time is shortened to 9 seconds, and the optimization work income is about 10 seconds. Next, we will explain how these 10-second gains are realized in stages.

Application optimization

The Application phase is usually used to initialize core business libraries. In the early stage of application development, we did not manage and control the startup tasks at this stage, resulting in the accumulation of a large number of strongly business-related codes. Before taking over this optimization project, there were more than 90 tasks performed in the entire Application. In the subsequent optimization, we streamlined the entire task process based on the following principles:

  • Tasks in Application should be global basic tasks
  • Application creation should minimize network request operations
  • Strong business-related tasks are not allowed when the Application is created
  • Minimize the work of Json parsing and IO operations when creating an Application

After optimization, the startup tasks in Application are reduced to more than 60, which are mainly divided into three categories: basic library initialization, function configuration and global configuration. The basic class library is mainly to initialize and configure basic libraries such as network library and log library. Except for the main process, other processes also depend on these tasks. Removing them will affect the overall stability. They are also the largest and most time-consuming tasks in the startup tasks, so reducing their time consumption is also the focus of continuous optimization later. Function configuration is mainly the pre-configuration of some globally related business functions, such as preloading of business caches, pre-loading of specific services, etc. Removing them will cause business damage. In this case, we need to find business demands and the balance between functional configuration. The global configuration is mainly for global UI configuration and file path processing operations, which account for a small proportion and take less time. They are pre-tasks for creating the home page, so they will not be processed for now.

The core of task arrangement is to deal with the front and rear dependencies of tasks. This requires developers to have a deep understanding of business logic. Since each application is different, it will not be expanded here. We mainly introduce some details of Xinyu in task arrangement and optimization:

  • Task scheduling based on process. Xinyu will start multiple processes during operation, which are used to achieve specific tasks and facilitate module isolation. Many processes, such as the IM process, usually only need to start a few core SDK initialization tasks such as Crash SDK and network SDK. For this type of process, if all the main link codes are executed according to the flow of the main process, unnecessary waste of resources will be caused. To this end, we will divide the tasks of the Application in detail, and fine-tune the execution of tasks to the process level, so as to avoid unnecessary tasks being executed in non-main processes.
  • lazy loading. This is mainly to transform some basic tasks, separate task initialization and task startup, move the startup work out of the Application creation process, and streamline it at the same time to remove redundant logic. When creating an object, you can delay the creation of its member objects, and flexibly use keywords such as by lazy to make the object lightweight.
  • process convergence. Multi-process can realize module isolation and avoid the upper limit of memory caused by the high memory ratio of a single process. The disadvantage is that multi-process will cause the overall memory usage of the application to be too large, and the probability of triggering low memory is higher. In addition, if the memory ratio is too high when the application starts, it may cause the mobile phone to reclaim memory and occupy a large amount of CPU resources. This is reflected in the user experience as slow startup and application freezes. Comparing the advantages and disadvantages of multiple processes, the strategy we currently adopt is to delay the start of processes other than the main process as much as possible, and at the same time reduce the number of processes through process merging. With these strategies, we ended up effectively reducing the number of processes at startup to two. Combined with the task arrangement work, we have screened the simplest set of tasks for each process to avoid them from performing unnecessary tasks and causing waste of resources. These tasks ultimately greatly reduce the memory occupied by the process startup.
  • thread convergence. For multi-core CPUs, an appropriate number of threads can improve efficiency, but if the number of threads floods, the CPU load will be overloaded. Multi-thread concurrency is essentially the process in which multiple threads take turns to obtain CPU usage rights. In the case of heavy load, too many threads compete for time slices. In addition to reducing the startup speed, it will also cause the main thread to freeze and affect the user experience. When optimizing this aspect, you need to ensure that a unified thread pool is used globally. At the same time, many second-party and third-party SDKs are also big users of creating sub-threads. At this time, it is necessary to communicate with relevant technical departments to eliminate unreasonable thread creation. On the other hand, avoiding network requests during the startup phase is also the key to reducing the number of threads.

Application optimization is the key to the entire startup process. Reasonable task arrangement can not only reduce the creation time of Application, but also have a great optimization effect on subsequent homepage creation. At present, on the oppo A5 mobile phone, the creation time of Xinyu Application has been reduced from 5 seconds to about 2.5 seconds, and there is still a lot of room for optimization.

start link

After executing the Application creation, the main work of the application process is to create the Activity. It should be noted here that there are many post tasks hidden in the main thread from Application to Activity, as well as callback monitoring of registered ActivityLifecycleCallbacks, which will secretly increase the time gap from Application to Activity. ActivityLifecycleCallbacks registration is usually related to business, and its registration is relatively hidden. In the previous business development, we did have a certain abuse of ActivityLifecycleCallbacks, which requires us to pay attention.

Regarding the time-consuming of the main thread message, we found many such problems when we used Profiler and Systrace to locate the time-consuming startup process. For various reasons, tasks in the Application will post time-consuming work to the main thread. On the surface, the creation time of the Application is shortened, but the overall startup time is expanded. For the time-consuming point, we should locate the root cause and solve it, instead of just posting it blindly, so as to treat the symptoms but not the root cause.

Secondly, shortening the link from startup to the home page is the focus of our optimization.

Heart meets Android startup optimization practice: reduce startup time by 50%

In the original startup process, the loading page is the startup page of Xinyu, which undertakes two tasks of routing and permission request.

  1. Under normal circumstances, the user starts the App, and judges whether to log in on the loading page. If not logged in, enter the login page
  2. If the user has already logged in, judge whether it is necessary to display the screen opening page, if necessary, enter the screen opening page, wait until the screen opening is completed, jump back to the loading interface, and then enter the home page.

It can be seen from the above that even if there is no opening page, the user starts the APP to display the home page, at least two activities must be started. The core of shortening the startup link is to combine the loading, main and opening pages into one page. Doing so can not only reduce the startup of the Activity at least once, but also process other tasks in parallel when displaying the opening page.

Heart meets Android startup optimization practice: reduce startup time by 50%

The homepage here is more like a canvas, and the creation and rendering of the homepage interface is one of them.

The code logic to realize this function is relatively simple, that is, the main page is set as the startup page, and the home page and opening screen page are encapsulated into two fragments, which are displayed according to the business logic. The user clicks the icon to enter the homepage. If it is judged to be logged in, the front page pre-task and homepage UI rendering will be performed, and at the same time, it will be judged whether to load the opening page fragment. It is worth noting that we have not removed the loading page. When it is judged that the user is not logged in, it will enter the loading page to perform the original management and login routing work. Since in the vast majority of cases, users will use the application while they are logged in, doing so will provide the greatest benefit and the least modification cost.

In order to realize this process, we need to handle the home page instance and home page task arrangement:

Home example

The original launchMode of the home page is singleTask, the purpose of this is to ensure that there is only one home page instance globally. However, we set the homepage as the startup page after the transformation. If we continue to set the homepage as singleTask, it will cause business bugs: when we retreat from the secondary page to the background and return to the foreground when we click the icon, we will jump to the homepage, and Not the original secondary page. The reason here can be simply understood as, when the icon is clicked, the system will call the home page of the launcher property. Due to the existence of the home page instance and its singleTask attribute in the stack, the system will use this existing instance and pop all the activities on it out of the stack, resulting in this abnormal situation. The solution is to select singleTop as the launchMode of the home page. singleTop does not ensure the global uniqueness of the home page instance. Fortunately, Xinyu APP has implemented the function of router jump, and can open the homepage through a unified url. In the last step of starting the homepage Activity, we add the flags of FLAG_ACTIVITY_NEW_TASK and FLAG_ACTIVITY_CLEAR_TOP to the intent to achieve the effect of singleTask. This solution can basically meet our needs, but it does not rule out the operation of directly starting the home page under certain circumstances. To this end, we registered Application.ActivityLifecycleCallbacks to monitor the activity instances in the stack. When there are multiple home page instances in the stack, we will clear the new home page instance and give a prompt.

task arrangement

The modified home page is not a page activity in the traditional sense, but a container that carries a series of task execution. Included in the follow-up transformation, we will separate the data request and UI rendering of the homepage. In addition, some high-quality business tasks are also extracted from the Application and placed on the homepage. In order to effectively manage the front and rear dependencies of these tasks, we A directed acyclic graph is required to manage these tasks.

Heart meets Android startup optimization practice: reduce startup time by 50%

The core idea of ​​task scheduling is to scatter tasks, stagger the peak loading, delay a low-priority and high-time-consuming task, or put it in a later idle time workflow for execution, and at the same time ensure that tasks are Before and after dependencies between, make sure not to make mistakes. Reasonably dividing the granularity of business tasks and sorting them is the key to determining the running speed of the graph, and it also tests the development’s familiarity with the business. At present, there are many open-source DAG solutions in the industry, such as alpha, etc. In order to meet specific business needs, the team has also developed a startup framework to realize the arrangement of homepage tasks.

Regarding the loading of homepage tasks, we initially proposed the concept of workflow. We divide the startup task into three workflow stages, including basic workflow, core workflow and idle workflow. The business logic in the entire startup process has been split into relatively independent tasks by us, and assigned to these three workflows according to priority and dependencies.

  • basic workflow: This stage is mainly to execute the creation of Application, which is used to place the basic SDK such as network library and monitoring. The task requirements of this workflow are as few as possible, and they are all pre-tasks of subsequent workflows.
  • core workflow: In this stage, the core business work will be placed. In addition to some core business initialization work, it also includes the rendering of the homepage UI, the request for business data, and the display of the opening page. These tasks are arranged and managed according to a DAG, starting from the creation of the front page. Since this stage has entered the home page, in order to allow users to see the first frame as soon as possible, we need to advance the tasks of business data acquisition and home page rendering as much as possible.
  • free time workflow: This stage is mainly suitable for placing some tasks with low priority, long time consumption and no requirement for completion time. There are several methods for judging the timing of idle time. Xinyu has done a simple processing here, that is, it is executed in IdleHandler 10 seconds after the end of the core workflow. If you want to judge the idle time more accurately, you can make a judgment by posting a message to the main thread, counting the interval between messages in the main thread and monitoring the application memory level.

The advantage of using the startup framework to manage the startup tasks is that the core business can be loaded in advance and the tasks can be fine-grained. For example, in order to make the homepage display faster, we separate the data request and UI rendering of the homepage. Advance the data request of the homepage to the Application. The optimization effect on this low-end machine is remarkable. The call object creation and Json parsing of data requests take a lot of time on low-end machines. By using the time created by Application and Activity to perform interface request operations at the same time, the loading time of the homepage on low-end machines is reduced from The original 3 seconds has been shortened to less than 1 second.

In the follow-up work, task arrangement is always the key direction of our startup optimization. Especially by locating the time-consuming point of each task and sorting out the entire task process, so as to reduce the startup time to the extreme, it is one of our long-term goals for startup optimization.

The startup optimization has reached the current point, and the code of the entire startup process has also been refurbished, but the offline evaluation data shows that the entire startup time is only shortened by about 3 seconds, and even the first refresh time has deteriorated to a certain extent. This is because the startup process is a whole, and the startup and homepage cannot be separated. The previous arrangement of tasks for startup will also inevitably affect the creation of the home page. In addition, our optimization work is not delicate enough, and we have not enough grasp of some details, especially the handling of locks, which will be introduced later.

Home page optimization

After sorting out the start-up links, next, we need to turn our attention to the home page. The homepage is the core page in the entire APP. It has heavy business logic and complex UI hierarchy.

lazy loading

After the previous transformation, our homepage is roughly as shown in the figure:
Heart meets Android startup optimization practice: reduce startup time by 50%
After opening the homepage, the APP will load the five TabFragments of the homepage, which is extremely time-consuming.

We measured the creation time of each fragment of the App on the oppo A5, and the rough data are as follows:

Heart meets Android startup optimization practice: reduce startup time by 50%

If the creation and loading of the other four fragments such as dynamics can be delayed, the startup time of about 2 seconds can be reduced theoretically.

Consider that only the first fragment is visible when the home page is displayed. To this end, we have implemented lazy loading on the home page. The home page uses the common ViewPager2+tabLayout architecture. ViewPager2 naturally supports lazy loading operations. In order to avoid the existing fragments being recycled when the page is switched, we increased the cache pool size of the recyclerView inside viewPager2.

 ((RecyclerView)mViewPager.getChildAt(0)).setItemViewCacheSize(mFragments.size());

Although this solution can greatly speed up the rendering speed of the home page, it essentially postpones the creation and rendering of other pages until the switching. If the page is heavy and the performance of the mobile phone is poor, there will be obvious lag and lag when switching. The white screen situation is also unacceptable.

To this end, we have transformed the homepage and each fragment.

Page plug-in

The creation of View is a big time-consuming home page rendering. Usually, we use LayoutInflater to load xml files, which involves xml parsing, and then the process of generating instances through reflection, which is generally time-consuming. For the relatively simple xml, we use code to build, but for complex layout files, using code to build is time-consuming and unmaintainable.

In order to make the home page “lighter”, we componentize the view based on the business perspective. One of the core ideas here is to let users see the most basic interface and use the most core functions. For example, for a video playback application, the first thing users want to see is its playback interface, and what they want to use is the video playback function. As for other top icons and so on, I don’t care. Based on this idea, we should firstly create and display playback components and playback functions, while other business modules can be loaded later in the form of ViewStub.

So what is the core page of Xinyu? It is the fate list of the home page, and the core function is the operation of the fate list. After understanding this, the complex logic of the homepage becomes clear, and we understand what the core needs of users are.

Here is a brief introduction to Plugin. Plugin is a set of UI componentization solutions precipitated within the team. It is essentially an upgraded version of ViewStub, but it also has the ability of fragment and is naturally suitable for mvvm. Plugin is a powerful component library. Regarding the specific implementation of Plugin, we will have the opportunity to introduce it in a follow-up article after careful polishing, and we will not expand it here for the time being. Through Plugin, we divide the complex view into independent business function components based on the business level, and load them according to the priority. This ensures that users can see the homepage faster and use the most core functions.

Heart meets Android startup optimization practice: reduce startup time by 50%

The Fate plugin will be displayed first when the home page is created, and the rest of the plugins can wait until the Fate plugin is fully displayed and the relevant data is returned before rendering and loading, which greatly reduces the load on the home page.

Json parsing and processing

The Json parsing operation is also a point that needs to be optimized. Before optimization, according to the test data of test students, on low-end mobile phones, the Json parsing time of the main interface is as high as 3 seconds, which is unacceptable.

The reason why Json parsing is time-consuming is essentially that when parsing, the creation from Json data to objects is generated and assigned through reflection operations. The more complex the object, the longer it will take. For the main interface of the home page, the time-consuming parsing of returned objects on low-end machines has exceeded the time-consuming time of UI rendering, which is a point we must overcome.

The solution we are currently adopting is to refactor the data objects on the home page using Kotlin, and annotate the related objects with @JsonClass(generateAdapter = true), which will generate corresponding parsing adapters for the marked objects during compilation, thus shortening parsing time.

XML parsing optimization

Test data shows that on phones with poor performance, xml inflate takes between 200 and 500 milliseconds. Custom controls and deep UI hierarchies can aggravate this parsing time.

In order to reduce the time of xml parsing. We optimized the xml of each UI module on the homepage, minimized the xml level, and avoided the use of unnecessary custom controls. On Xinyu App, a certain degree of abuse of strong business-related custom controls is also an important reason for the time-consuming xml loading.

In addition, we have also considered other solutions to reduce the parsing time, such as placing the xml parsing operation in a sub-thread and executing it in the Application in advance. This scheme is simple and effective, and has achieved certain benefits.

A specific example is that it takes about 200ms to parse the xml of the item on the homepage fate page. We put it in the sub-thread and preprocess it in advance. After the parsing is successful, the view is stored in the cache. When entering the homepage to create an item , get the view from the cache for rendering. The result is that the duration of item creation is successfully reduced to less than 50ms.

The asynchronous parsing scheme looks very effective, but if you expect to preprocess all xml through the asynchronous parsing scheme, you are doomed to be disappointed. Because it actually has great limitations.

The first thing to pay attention to is the view lock problem, which will change the parsing of xml from asynchronous to synchronous, resulting in slower parsing. The explanation and handling of locks will be described in detail in the next chapter. In the optimization example above, we partially circumvented the lock restriction by duplicating the LayoutInflater instance scheme.

   LayoutInflater inflater = LayoutInflater.from(context).cloneInContext(context);
   View view = inflater.inflate(R.layout.item_view, null, false);

In fact, the lock of the view is not limited to LayoutInflater, and there are also locks inside resources and assets, so the above-mentioned solution cannot fully achieve the synchronization effect.

The second point is that the sub-thread has a low priority, especially under heavy load, the parsing of xml by the sub-thread will lengthen the entire parsing process. This is easy to happen when the view is actually needed, and the xml parsing has not been executed yet, leading to a downgrade solution and re-parsing the xml on the main thread, which leads to a waste of resources, and finally makes the rendering time longer. Therefore, asynchronous parsing of xml can only be used for the preprocessing of very few core xml, and the level of xml should not be too complicated.

The parsing optimization of xml has always been the focus of our exploration. At present, we are trying to use the compose solution as the main direction to solve the time-consuming problem of xml parsing. It is still in the experimental and precipitation stage, and I believe that there will be expected results soon.

The optimization of the homepage has brought great benefits. Offline evaluations show that on the oppo A5 mobile phone, not only the display time of the first frame is reduced by about 3 seconds compared to before optimization, but the rendering time of the entire homepage has also reached 1 second within. The data on the home page can be displayed faster, and the user experience will be greatly improved.

Lock

Locks are so impressive a nuisance to startup optimization that we need a separate section.

When performing startup optimization, if we encounter a time-consuming task, we usually place it in a sub-thread for processing. In theory, if the resources are sufficient, the time of this time-consuming task will be completely optimized. But this is often not the case, and this operation may not work well, or even worsen it. The reason for this is the lock. Here we pick a few representative locks to talk about.

Retrofit

We all know that Retrofit generates the Call instance when requesting through dynamic proxy, but we often ignore the existence of locks.

Heart meets Android startup optimization practice: reduce startup time by 50%

If a large number of interfaces initiate requests at the same time on the home page, multiple interfaces create competition for this lock, which will virtually change interface requests from parallel to serial, which is especially obvious on mobile phones with poor performance.

So in the actual startup process, we can see that the time-consuming of an api request is often that the request itself only takes 300ms, but it takes 200ms to wait for the lock of Retrofit.

Heart meets Android startup optimization practice: reduce startup time by 50%

In addition, analyzing the general writing method of Retrofit, we can see that this part of the time-consuming is generated when executing create. The bad thing is that this way of writing often makes us mistakenly think that this is just creating an object, not a time-consuming operation, thus easily exposing it to the main thread.

   GitHubService service = retrofit.create(GitHubService.class);

We rectified the code on the home page and cut this part of the code into the sub-thread by switching threads. For the lock problem, through investigation, it is basically caused by the long time spent in parsing the data returned by the interface. This part involves the problem of Json parsing optimization, you can see the solution above.

reflection

We know that reflection is a time-consuming operation, especially for Kotlin’s reflection. Because of Kotlin’s various syntax sugars, reflection operations need to read class information from Metadata, so the reflection efficiency of Kotlin itself is much lower than that of Java.

At the same time, because of the existence of kotlin_builtins, many built-in information in Kotlin (such as basic types Int, String, Enum, Annotation, Collection are all stored in the apk in the form of files, and Kotlin will also include coroutines, multi-platform, etc. information used), class loading and IO operations on these files are implicitly triggered during reflection.

   static {
        Iterator<ModuleVisibilityHelper> iterator = ServiceLoader.load(ModuleVisibilityHelper.class, ModuleVisibilityHelper.class.getClassLoader()).iterator();
        MODULE_VISIBILITY_HELPER = iterator.hasNext() ? iterator.next() : ModuleVisibilityHelper.EMPTY.INSTANCE;
        }

For class loading, it is a lock + IO operation, so ANR often appears online.

Not to mention IO operations, limited by the overall file system load of the system, the time-consuming IO operations themselves are uncontrollable, and at the same time, the IO within the lock invisibly aggravates this time-consuming.

Therefore, reflection operations, especially Kotlin reflections, can be simply understood as a potential lockable IO operation (especially during APP startup).

This can cause all sorts of weird issues. We have encountered such an example in optimization. We hope to allow users to see the UI earlier through local caching, but due to the blessing of various locks, it not only slows down the speed of each API request, but also Like a butterfly effect, this part of the preloading time is converted back to the UI thread, causing the thread to be stuck. After a series of investigations, we finally determined that the cause was the time-consuming loading of buildins by Kotlin for the first time during the startup process. We can circumvent this problem by manually triggering the first reflection when necessary.

View’s lock

As mentioned earlier, the creation of views is time-consuming, and forward optimization will be more difficult. We will also think about whether it is possible to throw part of the UI into the IO thread to inflate. There is also a solution for parsing xml asynchronously above. At the same time, we also mentioned that the inflate part of the view is also locked.

Heart meets Android startup optimization practice: reduce startup time by 50%

This lock follows the LayoutInflater instance, in other words, it usually follows the Context. In the examples we encountered, the loading of the view was placed in the sub-thread. Due to the existence of the lock, the loading time of other views was prolonged, and due to the high CPU load and the low priority of the IO thread, this series of reasons Instead, the startup process deteriorates.

Moshi

Moshi’s deep traversal of a Class and generation of JsonAdapter involves a lot of reflection, and the time-consuming is uncontrollable. However, what is more scientific is that although Moshi’s internal cache also uses locks, time-consuming operations can be placed outside the locks with the help of ThreadLocal. Follow-up similar scenarios can also refer to this writing method.

Heart meets Android startup optimization practice: reduce startup time by 50%

Best Practices

Through the analysis of the above series of issues, we can conclude the following best practices:

  1. Do not do any Moshi parsing on the main thread;
  2. When parsing Kotlin classes through Moshi, use the JsonClass annotation;
  3. Do not perform any Retrofit-related operations on the main thread;
  4. Asynchronous inflate xml needs to pay attention to the problem of multi-thread competition.

In reality, problems are ever-changing, and rigid formulas are not enough, and best practices are not a silver bullet. When encountering a time-consuming task, I don’t want to find the reason, but put it in the child thread roughly. Whether it is because of a synchronization lock or other mechanisms, the consumed CPU time will always affect the UI in another form Thread efficiency.

Anti-deterioration

Startup optimization is a long-term optimization project, and its timeliness can be said to run through the life cycle of a product. So it doesn’t mean that after a period of focusing on tackling tough problems, everything will be fine if the start-up time is reduced. If there are no anti-degradation measures, after several iterations, the startup time will rise again, especially when the startup optimization comes to the deep water area, and various changes will have an inextricable impact on the startup speed. Therefore, startup optimization is not only a tough battle, but also a long-term tug-of-war.

Therefore, we need online and offline monitoring data as our guide to start optimization work.

online data

The core nodes are as follows:

Heart meets Android startup optimization practice: reduce startup time by 50%

In the online data, we mainly collect the attachBaseContext method of Application as the starting point of startup, the onWindowFocusChanged of the homepage as the visible node of the homepage, and the onViewAttachedToWindow as the node of the homepage data on the screen.

The monitoring node currently used by Xinyu is more inclined to use as a horizontal comparison. If more accurate measurement data is desired, the starting point of startup can use the creation time of the process, and the first frame of data collection can be called when dispatchDraw is located. However, considering the ease of use and expecting no business impact, Xinyu uses the current monitoring solution, which is mainly used to compare the optimization and degradation of historical versions.

It is worth mentioning that online data collection needs to pay attention to the impact of noise. Some models will kill the process in the background and restart, but during the restart process, due to the power saving policy, the restart process will be forcibly stopped, resulting in abnormal timing and noise.

The solution adopted by Xinyu here is to compare the start-up time with the start-up time of the thread. If the difference exceeds the threshold, the record will be discarded.

   val intervalTime =
        abs(System.currentTimeMillis() - attachStartTime - SystemClock.currentThreadTimeMillis())

In practice, 20 seconds is a good threshold number.

offline data

Data statistics through dots can reflect the current startup status in a certain sense, but it will be different from the actual experience of users. Therefore, when collecting data offline, we recommend using mobile phones of various performance levels to start and record the screen of the application, and finally measure the startup time.

measure

Regarding the anti-deterioration work, we are still in the groping stage. At present, after each version is released, the test students will give a performance evaluation report of the current version, and we will conduct a comprehensive analysis based on the online startup data to determine whether the startup time of the current version has deteriorated. If the data deteriorates, we will analyze the entire startup process to find outliers and correct them.

This scheme is relatively inefficient, because in many cases the degree of degradation is low and it is not easy to show it on the data. By the time it can be reflected from the data, there may have been a lot of abnormal points accumulated.

For this reason, we will conduct a special Code Review for the code changes of the Application and the home page. We do not encourage adding code to the Application, and if code needs to be added, its necessity will be evaluated. In addition, we have set a startup time-consuming alarm for the startup framework. If the startup time exceeds the threshold, developers will be reminded that there may be exceptions in the code during the development phase. We believe that all optimization work ultimately depends on the developers themselves, so it is most important for every team member to have an awareness of optimization in this area. We are also currently planning to formulate relevant normative measures to regulate the development work of team members in this area.

Summarize

Startup optimization has been done so far, Xinyu’s startup speed and first screen rendering time have entered the baseline. But as mentioned above, startup optimization is a special project that requires long-term attention, and our optimization of Xinyu’s startup time will not be limited to this. In this optimization project, we have encountered many problems and summarized many best practice solutions. The biggest gain is a deep understanding: there is no time-consuming for no reason. If there is, then it must be where there is a problem. Faced with time-consuming, don’t want to solve it, just put it in the sub-thread, and then ignore it, this problem will inevitably wait for you at the next intersection. Sometimes we have considered using some black technology to optimize the startup speed, but the effect is often unsatisfactory. After thinking about it, in fact, the simplest solution is often the best. Only by prescribing the right medicine can we get the best results. Blindly pursuing high-end technology to optimize, often falls into the dilemma of cannons and mosquitoes.

In the follow-up work, we will continue to iterate and polish the startup. Compared with the previous work, we will be more refined, from the shallower to the deeper, and customize the technical solutions in combination with specific businesses to achieve speed improvement. After groping for a more suitable solution, we will generalize the solution and apply it to other in business. Then look back, from the outside in, and then from the inside out, we may have a new understanding of the entire startup process.

References

This article was published by the NetEase Cloud Music Technology Team. Any form of reprinting without authorization is prohibited. We recruit various technical positions all year round. If you are going to change jobs and you happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!