IOS Wechat Memory Monitoring

Time:2019-10-9

Welcome to Yun + Community to get more Tencent’s mass technology practice dry goods.~

Author: Yang Jin, Senior Engineer of Tencent Mobile Client Development
Published by WeTest Quality Open Platform Team in Cloud + Community
For commercial reprints, please contact Tencent WeTest for authorization. For non-commercial reprints, please indicate the source.
Link to the original text: http://wetest.qq.com/lab/view/367.html

WeTest Guide

At present, the mainstream memory monitoring tool of iOS is Instruments Alocations, but it can only be used in the development stage. This article describes how to implement an offline memory monitoring tool for finding memory problems after App goes online.

FOOM (Foreground Out of Memory), refers to App in the foreground because of excessive memory consumption caused by system killing. For users, the performance is the same as crash. As early as August 2015, Facebook proposed FOOM detection method. The general principle is to exclude all kinds of situations, the remaining situation is FOOM, specific links: https://code.facebook.com/posts/1146930688654547/reduce-fooms-in-the-facebook-ios-app/.

From the initial data, the ratio of the number of FOOMs per day to the number of logged-in users is close to 3%, and the crash rate is less than 1% in the same period. At the beginning of 16 years, the feedback message of a boss frequently flipped back. After pulling more than 2G logs hard, it was found that the frequent logging of kV reports caused FOOM. Then in August of 16 years, many external users’feedback tweets flipped out shortly after the start of the tweets. Analyzing a large number of logs, we still can’t find the reason for FOOM. Wechat urgently needs an effective memory monitoring tool to find problems.

I. Realization Principle

The original version of Wechat Memory Monitor used Facebook’s FBAllocation Tracker tool to monitor OC object allocation. Fishhook tool hook malloc/free and other interfaces to monitor heap memory allocation. Every second, the current number of OC objects, TOP 200 maximum heap memory and its allocation stack were output locally with text log. The scheme is simple to implement and can be completed in one day. By sending TestFlight to users, it is finally found that contact module causes FOOM due to the migration of DB to load a large number of contacts.

However, the scheme has many shortcomings:

1. The monitoring granularity is not fine enough, such as the qualitative change caused by large amount of allocation of small memory can not be monitored. In addition, fishhook can only hook its own C interface call, which has no effect on the system library.

2. The interval between logs is not well controlled. If the interval is too long, the mid-peak value may be lost. If the interval is too short, the performance problems such as power consumption and frequent IO will arise.

3. The original log reported depends on manual analysis, lacking good page tools to display and categorize problems.

So the second edition of Instruments takes Alocations as a reference, focusing on four aspects of optimization, namely data collection, storage, reporting and presentation.

### 1. Data collection
In order to solve the problem of ios10 nano crash at the end of September 166, the source code of libmalloc was studied, and these interfaces were found accidentally:

When malloc_logger and _syscall_logger function pointers are not empty, malloc/free, vm_allocate/vm_deallocation and other memory allocation/release pointers notify the upper layer through these two pointers, which is also the implementation principle of memory debugging tool malloc stack. With these two function pointers, it is easy to record the memory allocation information (including allocation size and allocation stack) of the current surviving object. The allocation stack can be captured by the backtrace function, but the captured address is a virtual memory address, and symbols cannot be parsed from the symbol table dsym. So also record the offset slide for each image when loading, so that the symbol table address = stack address-slide.

In addition, in order to better categorize data, each memory object should have its own categorized Category, as shown in the figure above. For heap memory objects, its Category name is “Malloc”+allocation size, such as “Malloc 48.00KiB”. For virtual memory objects, when vm_allocate is called to create, the final parameter flags represents what kind of virtual memory it is, and this flags corresponds to the first parameter type of the function pointer _syscall_logger mentioned above. Each flag can be specified in the header file < mach/vm_statis. Tics.h> found; for OC objects, its Category name is OC class name, which can be obtained by hook OC method + NSObject alloc:

Later, it was found that the class static method of creating objects by NSData did not call + NSObject alloc. The implementation is to call the C method NSAllocateObject to create objects. That is to say, OC objects created by this method can not get the OC class name by hook. Finally, the answer is found in Apple Open Source CF-1153.18.CFOASafe = true andWhen CFObjectAllocSetLastAllocEventNameFunction!= NULL, CoreFoundation creates an object and tells the upper layer what type of current object is through this function pointer:

Through the above way, our monitoring data source is basically the same as Allocations, of course, with the help of private APIs. Without enough “skills,” the private API would not be able to carry the Appstore, and we would have to step back. Modifying malloc_default_zone function return malloc_zone_t structure malloc, free and other function pointers, also can monitor heap memory allocation, the effect is the same as malloc_logger; virtual memory allocation can only be through fishhook mode.

2. Data Storage

Survival Object Management

APP will make a lot of requests / releases of memory during runtime. As an example, within 10 seconds of the start-up of Wechat, 800,000 objects have been created and 500,000 released. Performance is a challenge. In addition, memory application/release is minimized in the storage process. So instead of sqlite, it uses a lighter balanced binary tree for storage.

Splay Tree, also known as split tree, is a kind of binary sorting tree. It does not guarantee that the tree is balanced, but the average time complexity of various operations is O (logN), which can be approximated as balanced binary tree. Compared with other balanced binary trees (such as red and black trees), it occupies less memory and does not need to store additional information. The main starting point of stretching tree is to consider the principle of locality (a newly visited node may be visited next time, or a node with more visits may be visited next time). In order to make the whole search time less, frequently queried nodes are moved closer to the root of the tree by “stretching” operation. In most cases, memory applications are quickly released, such as autoreleased objects, temporary variables, etc., while OC objects update their Category immediately after they apply for memory. So stretching trees are the best way to manage it.

Traditional binary tree is implemented by linked list. Every time nodes are added or deleted, memory will be applied/released. In order to reduce memory operations, binary trees can be implemented with arrays. The concrete method is that the left and right children of the father node change from the previous pointer type to the integer type, representing the subscript of the child in the array; when deleting the node, the deleted node stores the subscript of the array where the released node is located.

Stack storage

According to statistics, there are millions of backtrace stacks during the operation of Wechat, and the average stack length is 35 when the maximum stack length is 64. If 36 bits store an address (armv8 Max virtual memory address 48 bits, actually 36 bits is enough), the average storage length of a stack is 157.5 bytes, and 1 M stacks require 157.5 M storage space. But from the breakpoint observation, most stacks actually have a common suffix, such as the following two stacks with the same seven addresses:

To do this, you can use Hash Table to store these stacks. The idea is that the entire stack is inserted into the table in the form of a linked list. The linked list node stores the index of the current address and the table where the last address is located. Each insert an address, first calculate its hash value, as an index in the table, if the slot corresponding to the index does not store data, record the linked list node; if there is stored data, and the data is consistent with the linked list node, hash hit, continue to process the next address; inconsistent data means hash conflict, need to recalculate the hash value until the storage conditions are met. For example (simplified hash calculation):

  1. Stack 1’s G, F, E, D, C, A are inserted into Hash Table in turn. The data of index 1 to 6 nodes are (G, 0), (F, 1), (E, 2), (D, 3), (C, 4), (A, 5) in turn. Stack1 index entry is 6
  2. It’s time to insert Stack2, because the data of G, F, E, D and C nodes are consistent with the first five nodes of Stack1, hash hits; B inserts the new position 7 (B, 5). Stack2 index entry is 7
  3. Finally, insert Stack3, G, F, E, D node hash hit; but because the last address D index of Stack3 A is 4, not existing (A, 5), hash missed, find the next blank location 8, insert node (A, 4); the last address A index of B is 8, not existing (B, 5), hash missed, find the next blank location 9, insert node (B, 9). Stack3 index entry is 9

After such suffix compression storage, the average stack length is reduced from 35 to less than 5. The storage length of each node is 64 bits (36 bits storage address, 28 bits storage parent index), the space utilization of hashtable is 60% +, the average storage length of a stack only needs 66.7 bytes, and the compression rate is as high as 42%.

performance data

After the above optimization, the memory monitoring tool occupies less than 13% of the CPU in the operation of the iPhone 6Plus. Of course, this is related to the amount of data. Heavy users (such as too many groups, frequent messages, etc.) may occupy a slightly higher rate. The storage of data takes up about 20M of memory, which maps files to memory in MMAP mode. The benefits of MMAP can be Google by itself.

3. Data reporting

Because memory monitoring stores the memory allocation information of all surviving objects and has a huge amount of data, when FOOM appears, it is impossible to report in full, but selectively according to some rules.

Firstly, all objects are classified according to Category, and the number of objects and the size of memory allocated for each Category are counted. This list of data is very small and can be reported in full. Then all the same stacks under Category are merged to calculate the number of objects and memory size of each stack. For some Categories, such as allocating size TOP N, or UI-related (such as UIViewController, UIView, etc.), the stack allocating size TOP M is only reported. The reporting format is similar to this:

4. Page presentation

The page shows a reference to Allocations to see what Categories are, the size and number of objects allocated per Category, and some Categories can also see the allocation stack.

In order to highlight the problem and improve the efficiency of solving the problem, the background first finds out the Categories that may cause FOOM according to the rules (such as the above Suspect Categories). The rules are as follows:

  • Is the number of UIViewController abnormal
  • Is the number of UIViews abnormal
  • Is the number of UIImage abnormal
  • Is the size of other Category allocations abnormal and the number of objects abnormal?

Then we calculate the eigenvalue of the suspected Category, which is the OOM reason. The eigenvalues are composed of “Caller 1”, “Caller 2” and “Category, Reason”. Caller1 refers to the application memory point, Caller2 refers to the specific scenario or business, which are extracted from the stack allocated size first under Category. Caller1 extraction is as meaningful as possible, not the last address of the assignment function. For example:

After all reports compute the eigenvalues, they can be categorized. The first-level classification can be Caller1 or Category, and the second-level classification is a feature aggregation related to Caller1/Category. The results are as follows:

Classification I
Secondary classification

5. Operational strategy

As mentioned above, memory monitoring will lead to certain performance loss. At the same time, the amount of data reported is about 300K each time. Full reporting will have certain pressure on the background. Therefore, we should open the sample for the online users, and the gray package users/intra-company users/whitelist users by 100%. Local data are kept for the last three times at most.

II. Reducing Misjudgement

Let’s review how Facebook decided whether FOOM appeared in the last launch:

  1. App has not been upgraded
  2. App does not call exit () or abort () to exit
  3. App did not crash
  4. Users did not force App back
  5. System not upgraded/restarted
  6. App was not running in the background.
  7. App appears FOOM

1, 2, 4, 5 are easy to judge. 3 depends on the crash callback of its own CrashReport component. 6, 7 relies on Application State and foreground and background handover notifications.Since Wechat reported FOOM data online, there have been many misjudgments, including:

ApplicationState is not allowed

Some systems will briefly wake up app in the background, Application State is Active, but not BackgroundFetch; Execute didFinish Launching With Options and exit, but also receive BecomeActive notification, but soon exit; the entire start-up process lasts from 5 to 8 seconds. The solution is to think that this boot is a normal front-end boot after receiving a second notice from BecomeActive. This method can only reduce the probability of miscarriage of justice, and can not be completely solved.

Group control class plug-in

This kind of plug-in is the software that can control the iPhone remotely. Usually a computer can control many mobile phones. The screen of the computer and the screen of the mobile phone can operate synchronously in real time, such as opening the micro-message, automatically adding friends, sending friends circles, forcing the withdrawal of micro-message. This process is prone to misjudgement. The solution can only reduce such misjudgments through backstage attacks.

CrashReport component appears crash without callback upper layer

At the end of May 17, a large number of GIF crash broke out in Wechat. The crash was caused by memory crossing. However, when crash signal was received to write crashlog, the component could not write crashlog properly because of the damage of memory pool, and even caused a second crash. The upper layer could not receive crash notification, so it was misjudged as FOOM. Instead of relying on crash callbacks, as long as the last crashlog (complete or incomplete) exists locally, it is considered to be an APP restart caused by crash.

Front-desk card death causes system watchdog manslaughter

That’s the usual 0x8badf00d, which is usually due to too many foreground threads, deadlocks, or persistently high CPU usage. Such killings cannot be captured by App. For this reason, we combined with the existing Katon system, the current station has captured Katon at the last moment of operation, we believe that this boot was killed by watchdog. At the same time, we divide a new restart reason from FOOM, which is called “APP Front Desk Card Dead Leads to Restart”, into the focus of attention.

III. Outcomes

Since Wechat’s online memory monitoring in March, 2017, it has solved more than 30 memory problems, involving chat, search, circle of friends and other businesses. The FOOM rate has dropped from 3% in the early 17 years to 0.67% at present, while the front-end card mortality rate has dropped from 0.6% to 0.3%.

Fourth, Common Questions

UIGraphicsEndImageContext

UIGraphics Begin ImageContext and UIGraphics End ImageContext must appear in pairs or cause context leakage. In addition, Analeze of XCode can also eliminate such problems.

UIWebView

Whether you open a web page or execute a simple JS code, UIWebView takes up a lot of APP memory. WKWebView not only has excellent rendering performance, but also has its own independent process. Some pages related memory consumption moves to its own process, which is the most suitable replacement for UIWebView.

autoreleasepool

Usually autoreleased objects are released at the end of runloop. If a large number of autoreleased objects are generated in the loop, the memory peak will skyrocket, and even OOM will appear. Proper addition of autorelease pool can release memory in time and reduce peak value.

Mutual reference

It is easy to refer to each other where self is used in the block, and self holds the block, which can only be avoided by code specification. In addition, NSTimer’s target and CAAnimation’s delegate are strong references to Obj. At present, Wechat avoids such problems through its own MMNoRetainTimer and MDelegate Center.

Large Picture Processing

For example, in the past, the picture zoom interface was written as follows:

However, OOM often occurs when processing large resolution images, because – UIImage drawInRect: when drawing, decode the picture first and regenerate the bitmap of the original resolution size, which is very memory-consuming. The solution is to use a lower-level ImageIO interface to avoid the generation of intermediate bitmaps:
Large View

Big view means that the size of View is too large and contains the content to be rendered. Ultra-long text is a common group message in Wechat, usually thousands or even tens of thousands of lines. Drawing it into the same view will consume a lot of memory and cause serious jamming. The best way to do this is to divide the text into multiple Views, and use the TableView reuse mechanism to reduce unnecessary rendering and memory usage.

Finally, several links related to iOS memory are recommended:

  • Memory Usage Performance Guidelines
  • No pressure, Mon!

Tencent WeTest iOS Pre-Trial Tool

In order to improve the pass rate of IEG Apple audit, Tencent set up a special Apple audit and test team to build the iOS pre-approval tool for this product. After one and a half years of internal operation, the iOS approval rate of Tencent’s internal applications has increased from an average of 35% to 90%.

Now we share the experience of Tencent’s internal product review with you in the form of online tools. It can be used online on WeTest Tencent Quality Open Platform. Click to experience immediately!

If you have any questions, please contact Tencent WeTest Enterprise QQ: 800024531.

IOS Pre-Trial Service

[Scanning Tool]Uploading IPA packages, pictures, videos and application descriptions can be tested; multi-dimensional automatic scanning of rejection risk of submission materials; feedback of comprehensive scanning reports within one hour.
[Expert Pre-Trial]Tencent experts traverse all the functional modules of App for you; fully expose the risk of rejection of App content; follow-up questions until online (official rejection mail is required).
[Expert Consultation]One-to-one service for deep pre-trial experts; flexible and optional consultation time, on-demand purchase; targeted solution to audit problems.
[ASO optimization]Professional teams analyze the status quo of App’s ASO in a multi-dimensional and in-depth manner; screen highly relevant keywords around the target user group of App; and help improve the exposure of App in the Apple App Store.

Relevant Reading

Inventory of database technology in 2017
Travel of Machine Learning Algorithms
Principle and Realization of Android Image Processing-Gauss Fuzzy


This article has been authorized by the author to be published by Yunga Community. Please indicate the source of the article for reprinting.