Launch crash of IOS monitoring notes

Time:2020-2-15

Preface

Compared with the normal crash problem, the loss caused by starting crash is much greater. Normally, if there is a strong enough build release system, most of the time, problems can be found and fixed in time before the version goes online, but there is still a small probability of online accidents. Starting crash usually has two characteristics: serious damage and difficult to capture

Start-up process

A lot of things happened from when the app icon was clicked by the user until the app could start responding. Normally, although we want the crash monitoring tool to start as early as possible, the access side always waits for the launch event to start the tool, and the crash before this time is to start crash. The following lists the possible stages of starting crash when the application is up to launch:

The order of initialize may be earlier, but it will always be between load and launch. In the figure, if we want to monitor and start crash, the time point of monitoring must be put into the load phase to ensure the best monitoring effect

How to monitor

The simplest way is whether the access party is willing to start crash monitoring or not, we directly start the monitoring function in the load method. However, such an approach will expose the application to four risks:

  • The online switch scheme similar to a / b loses the control ability of monitoring tools
  • Crash monitoring startup has crash problem, which will lead to complete application paralysis
  • The load stage class has not been loaded completely, and the crash caused by recursive loading of the startup tool process cannot be monitored

Based on these risk points, the scheme to start crash monitoring should meet these conditions:

  • The startup process does not depend on classes to avoid crash caused by recursive loading
  • Once the process crashes, it can ensure the security of log records

Finally, the flow chart of monitoring is obtained:

Independent class

Independent class means that the monitoring tool needs to use the C interface to realize its functions. Although it is troublesome, the mechanism of runtime determines that all method calls should take the objc ﹣ msgsend function as the entry. Therefore, if you can hook this function and implement a call stack structure, and put all calls into the stack record, it is not difficult to trace method calls. Fishhook provides the ability to hook functions:


__unused static id (*orig_objc_msgSend)(id, SEL, ...);

__attribute__((__naked__)) static void hook_Objc_msgSend() {
 /// save stack data
 /// push msgSend
 /// resume stack data
 
 /// call origin msgSend
 
 /// save stack data
 /// pop msgSend
 /// resume stack data
}

void observe_Objc_msgSend() {
 struct rebinding msgSend_rebinding = { "objc_msgSend", hook_Objc_msgSend, (void *)&orig_objc_msgSend };
 rebind_symbols((struct rebinding[1]){msgSend_rebinding}, 1);
}

Implement msgsend

__The modified function tells the compiler not to use the stack to save parameter information when the function is called, and the return address of the function will be saved in LR register. Because msgsend itself uses this modifier, it must be able to save and restore register data in the operation of recording function calls. Msgsend stores the parameter information by using the register of x0 – x9. You can manually use SP register to store and restore the parameter information:

///Save register parameter information
#define save() \
__asm volatile ( \
 "stp x8, x9, [sp, #-16]!\n" \
 "stp x6, x7, [sp, #-16]!\n" \
 "stp x4, x5, [sp, #-16]!\n" \
 "stp x2, x3, [sp, #-16]!\n" \
 "stp x0, x1, [sp, #-16]!\n");

///Restore register parameter information
#define resume() \
__asm volatile ( \
 "ldp x0, x1, [sp], #16\n" \
 "ldp x2, x3, [sp], #16\n" \
 "ldp x4, x5, [sp], #16\n" \
 "ldp x6, x7, [sp], #16\n" \
 "ldp x8, x9, [sp], #16\n" );
 
///Function call, value passes in function address
#define call(b, value) \
 __asm volatile ("stp x8, x9, [sp, #-16]!\n"); \
 __asm volatile ("mov x12, %0\n" :: "r"(value)); \
 __asm volatile ("ldp x8, x9, [sp], #16\n"); \
 __asm volatile (#b " x12\n");


///Msgsend must be implemented by assembly
__attribute__((__naked__)) static void hook_Objc_msgSend() {

 save()
 __asm volatile ("mov x2, lr\n");
 __asm volatile ("mov x3, x4\n");
 
 call(blr, &push_msgSend)
 resume()
 call(blr, orig_objc_msgSend)
 
 save()
 call(blr, &pop_msgSend)
 
 __asm volatile ("mov lr, x0\n");
 resume()
 __asm volatile ("ret\n");
}

Log record

Conventional I / O processing can not guarantee the data security of crash, so MMAP is the most suitable solution for this scenario. MMAP can ensure that even if the application has an irresistible crash, it can complete the work of writing files to io. In addition, we only need to record the call stack information of class and selector. In the absence of recursive algorithm, we only need a small amount of memory to record these data:


time_t ts = time(NULL);
const char *filePath = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES).lastObject stringByAppendingString: [NSString stringWithFormat: @"%d", ts]].UTF8String;

unsigned char *buffer = NULL;
int fileDescriptor = open(filePath, O_RDWR, 0);
buffer = (unsigned char *)mmap(NULL, MB * 4, PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, fileDescriptor, 0);

Buffer is the buffer where we write data. In order to ensure the accuracy of the call stack information, we need to update the buffer data every time we call the function information in and out of the stack. A feasible way is to add an @ symbol prefix to each call record, always save the symbol subscript of the last call record, and clear all data after the subscript when the stack is out


static inline void push_msgSend(id _self, Class _cls, SEL _cmd, uintptr_t lr) {
 _lastIdx = _length;
 buffer[_lastIdx] = '@';
 ......
}

static inline void pop_msgSend(id _self, SEL _cmd, uintptr_t lr) {
 ......
 buffer[_lastIdx] = '

static inline void push_msgSend(id _self, Class _cls, SEL _cmd, uintptr_t lr) {
_lastIdx = _length;
buffer[_lastIdx] = '@';
......
}
static inline void pop_msgSend(id _self, SEL _cmd, uintptr_t lr) {
......
buffer[_lastIdx] = '\0';
_length = _lastIdx;
size_t idx = _lastIdx - 1;
while (idx >= 0) {
if (buffer[idx] == '@') {
_lastIdx = idx;
break;
}
idx--;
}
}
'; _length = _lastIdx; size_t idx = _lastIdx - 1; while (idx >= 0) { if (buffer[idx] == '@') { _lastIdx = idx; break; } idx--; } }

Clear log

Because the call of msgsend is very frequent, this monitoring scheme is not suitable for long-term startup, so it needs to turn off monitoring at some time. Because crash may also exist when normal crash monitoring is started, listening to becomeactive notification to turn off the function is the most appropriate choice, because at this time, the stage when launch starts the crash monitoring tool has passed, which can ensure that the tool itself is in normal use:


[[NSNotificationCenter defaultCenter] addObserver: self selector: @selector(closeMsgSendObserve) name: UIApplicationDidBecomeActiveNotification object: nil];

- (void)closeMsgSendObserve {
 close(fileDescriptor);
 munmap(buffer, MB * 4);
 [[NSFileManager defaultManager] removeItemAtPath: _logPath error: nil];
}

RollBACK

When rollback is needed, it indicates that crash has been started. At this time, there are different processing methods according to the log content:

Log file is empty

This is the most dangerous situation. If the log file is empty, it means that the file has been established, but no method call has been generated. It is likely that there is crash in the process of fishhook processing. At this time, the monitoring scheme should be closed directly, even if it is not the cause, and the version should be added quickly

Log file is not empty

If the log file is not empty, it indicates that crash has been successfully monitored. At this time, the log file should be uploaded synchronously and quickly fed back to the business side to stop loss in time. First, stop loss means should be synchronized to ensure the application can continue to run. According to different situations, stop loss rollback methods include the following:

  1. If crash occurs in a functional component that does not interfere with normal business execution, you can turn off the corresponding function through the switch on the A / b line, provided that the functional component uses the switch control
  2. The code at the crash site has interfered with normal business execution, but the error code is short. You can try to dynamically fix the error code by issuing the patch package from the server, but the patch package should be wary of introducing other problems
  3. In the case that a / B test and patch package can not solve the problem, if the project adopts a reasonable component design, H5 is used to complete the normal operation of the application through routing and forwarding
  4. Lack of dynamic repair means and crash does not interfere with normal business execution. Consider stopping all plug-ins and auxiliary components
  5. Lack of dynamic repair means, including 1, 2, 3 schemes. Consider providing reverse package through the third party jailbreak market to prompt users to download and install
  6. Lack of dynamic repair means, including 1, 2, 3 schemes. Quick stop loss of the newly issued version, and use test flight to quickly restore users to use in batches

summary

The above is the whole content of this article. I hope that the content of this article has a certain reference learning value for everyone’s study or work. If you have any questions, you can leave a message and exchange. Thank you for your support for developepaar.