In depth analysis of the structure of the method in objc

Time:2022-1-11

Blog: Draveness
Pay attention to the warehouse and get updates in time:iOS-Source-Code-Analyze
Follow: Draveness · Github

Because objc’s runtime can only be compiled under Mac OS, the code in this article is on Mac OS, that isx86_64The code running in arm64 will be specially described for the code running in arm64.

In the previous analysisisaArticlesLearn about isa from the initialization of nsobjectIt has been mentioned in that when an instance method is called, it will be held through itisaPointer to find the corresponding class, and thenclass_data_bits_tIn this article, we will introduce how methods are stored in objc.

The of this article will first analyze the storage structure of the method in memory according to the objc source code, and then verify the correctness of the analysis step by step in the lldb debugger.

Method’s location in memory

Let’s first understand the structure diagram of classes in objc:

In depth analysis of the structure of the method in objc

  • isaIs a pointer to a metaclass. If you don’t know about metaclasses, you can see itClasses and Metaclasses

  • super_classPoints to the parent class of the current class

  • cacheUsed to cache pointers andvtable, speed up method calls

  • bitsIt is the place to store the methods, properties, protocols and other information of the class

class_data_bits_tstructural morphology

This summary will analyze theclass_data_bits_t bits

The following is in objcclass_data_bits_tStructure, which contains only a 64 bitbitsUsed to store class related information:

In depth analysis of the structure of the method in objc

stayobjc_classComments in structureclass_data_bits_tamount toclass_rw_tThe pointer is marked with RR / alloc.

class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

It provides us with a convenient way to return theclass_rw_t *Pointer:

class_rw_t* data() {
   return (class_rw_t *)(bits & FAST_DATA_MASK);
}

takebitsAndFAST_DATA_MASKCarry out bit operation and take only one of them[3, 47]Bit conversion toclass_rw_t *return.

In x86_ On 64 architecture, Mac OSOnly 47 of these bits are used to assign addresses to objects。 Moreover, since the address should be byte aligned in memory, the last three bits of the mask are 0.

becauseclass_rw_t *The pointer only exists in the[3, 47]Bits, so the last three bits can be used to store other information about the current class:

In depth analysis of the structure of the method in objc

#define FAST_IS_SWIFT           (1UL<<0)
#define FAST_HAS_DEFAULT_RR     (1UL<<1)
#define FAST_REQUIRES_RAW_ISA   (1UL<<2)
#define FAST_DATA_MASK          0x00007ffffffffff8UL
  • isSwift()

    • FAST_IS_SWIFTUsed to determine swift class

  • hasDefaultRR()

    • FAST_HAS_DEFAULT_RRThe current class or parent class contains the defaultretain/release/autorelease/retainCount/_tryRetain/_isDeallocating/retainWeakReference/allowsWeakReferencemethod

  • requiresRawIsa()

    • FAST_REQUIRES_RAW_ISAAn instance of the current class needs rawisa

implementclass_data_bits_tIn structuredata()Method or callobjc_classMediumdata()Method returns the sameclass_rw_t *Pointer, becauseobjc_classThe method in is just rightclass_data_bits_tEncapsulation of the corresponding method in.

// objc_ Data () method in class
class_data_bits_t bits;

class_rw_t *data() { 
   return bits.data();
}

// class_ data_ bits_ Data() method in t
uintptr_t bits;

class_rw_t* data() {
   return (class_rw_t *)(bits & FAST_DATA_MASK);
}

class_rw_tandclass_ro_t

The properties, methods, protocols and other information in the objc class are saved in theclass_rw_tMedium:

struct class_rw_t {
    uint32_t flags;
    uint32_t version;

    const class_ro_t *ro;

    method_array_t methods;
    property_array_t properties;
    protocol_array_t protocols;

    Class firstSubclass;
    Class nextSiblingClass;
};

There is also a pointer to a constantro, whereThe properties, methods and protocols of the current class have been determined at compile time

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
    uint32_t reserved;

    const uint8_t * ivarLayout;
    
    const char * name;
    method_list_t * baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;
};

During compilationClassclass_data_bits_t *dataIt points to aclass_ro_t *Pointer:

In depth analysis of the structure of the method in objc

Then loadObjc runtimeIn the process ofrealizeClassMethod:

  1. fromclass_data_bits_tcalldataMethod to convert the results fromclass_rw_tCast toclass_ro_tPointer

  2. Initialize aclass_rw_tstructural morphology

  3. Set structureroAndflag

  4. Finally, set the correctdata

const class_ro_t *ro = (const class_ro_t *)cls->data();
class_rw_t *rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1);
rw->ro = ro;
rw->flags = RW_REALIZED|RW_REALIZING;
cls->setData(rw);

The following figure isrealizeClassThe layout of the memory occupied by the class after the method is executed. You can compare it with the memory layout before calling the method above to see what changes are made:

<p align=’center’>
In depth analysis of the structure of the method in objc

However, after this code runsclass_rw_tThe method, property and protocol list in are empty. Need at this timerealizeClasscallmethodizeClassMethod toLoad the methods (including classification), properties and protocols implemented by the class into themethodspropertiesandprotocolsIn the list

XXObject

Next, we will analyze a classXXObjectChanges in memory during runtime initialization, which isXXObjectInterface and implementation of:

// XXObject. H file
#import <Foundation/Foundation.h>

@interface XXObject : NSObject

- (void)hello;

@end

// XXObject. M file

#import "XXObject.h"

@implementation XXObject

- (void)hello {
    NSLog(@"Hello");
}

@end

This code is running on Mac OS X 10.11.3 (x86#u 64) instead of iPhone simulator or real machine. If you run on iPhone or real machine, it may be different.

<p align=’center’>
In depth analysis of the structure of the method in objc

This is the code of the main program:

#import <Foundation/Foundation.h>
#import "XXObject.h"

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        Class cls = [XXObject class];
        NSLog(@"%p", cls);
    }
    return 0;
}

Structure of classes in memory after compilation

becauseThe location of a class in memory is determined at compile time, run the code acquisition once firstXXObjectAddress in memory.

0x100001168

Next, before the entire objc runtime is initialized, that is_objc_initAdd a breakpoint to the method:

In depth analysis of the structure of the method in objc

Then enter the following command in lldb:

(lldb) p (objc_class *)0x100001168
(objc_class *) $0 = 0x0000000100001168
(lldb) p (class_data_bits_t *)0x100001188
(class_data_bits_t *) $1 = 0x0000000100001188
(lldb) p $1->data()
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
(class_rw_t *) $2 = 0x00000001000010e8
(lldb) P (class_ro_t *) $2 // class_ rw_ T force conversion to class_ ro_ t
(class_ro_t *) $3 = 0x00000001000010e8
(lldb) p *$3
(class_ro_t) $4 = {
  flags = 128
  instanceStart = 8
  instanceSize = 8
  reserved = 0
  ivarLayout = 0x0000000000000000 <no value available>
  name = 0x0000000100000f7a "XXObject"
  baseMethodList = 0x00000001000010c8
  baseProtocols = 0x0000000000000000
  ivars = 0x0000000000000000
  weakIvarLayout = 0x0000000000000000 <no value available>
  baseProperties = 0x0000000000000000
}

In depth analysis of the structure of the method in objc

Now we get the read-only attribute of the class processed by the compilerclass_ro_t

(class_ro_t) $4 = {
  flags = 128
  instanceStart = 8
  instanceSize = 8
  reserved = 0
  ivarLayout = 0x0000000000000000 <no value available>
  name = 0x0000000100000f7a "XXObject"
  baseMethodList = 0x00000001000010c8
  baseProtocols = 0x0000000000000000
  ivars = 0x0000000000000000
  weakIvarLayout = 0x0000000000000000 <no value available>
  baseProperties = 0x0000000000000000
}

You can see that there are onlybaseMethodListandnameIt is valuable, othersivarLayoutbaseProtocolsivarsweakIvarLayoutandbasePropertiesAll point to null pointers because there are no instance variables, protocols and properties in the class. So the structure here meets our expectations.

View through the following commandbaseMethodListContents in:

(lldb) p $4.baseMethodList
(method_list_t *) $5 = 0x00000001000010c8
(lldb) p $5->get(0)
(method_t) $6 = {
  name = "hello"
  types = 0x0000000100000fa4 "[email protected]:8"
  imp = 0x0000000100000e90 (method`-[XXObject hello] at XXObject.m:13)
}
(lldb) p $5->get(1)
Assertion failed: (i < count), function get, file /Users/apple/Desktop/objc-runtime/runtime/objc-runtime-new.h, line 110.
error: Execution was interrupted, reason: signal SIGABRT.
The process has been returned to the state before expression evaluation.
(lldb)

In depth analysis of the structure of the method in objc

use$5->get(0)Successfully obtained-[XXObject hello]Method structuremethod_t。 When trying to get the next method, the assertion indicates that there is only one method in the current class.

realizeClass

This article will notrealizeClassThrough detailed analysis, the main function of this method is to initialize the class for the first time, including:

  • Allocate read / write data space

  • Returns the real class structure

static Class realizeClass(Class cls)

The above is the signature of this method. We need to make a conditional breakpoint in this method to judge whether the current class isXXObject

In depth analysis of the structure of the method in objc

Here, it is directly determined whether the two pointers are equal without using[NSStringFromClass(cls) isEqualToString:@"XXObject"]Because these methods cannot be called at this time point, and there are no such methods in objc, the current class can only be confirmed by judging whether the class pointers are equalXXObject

The direct comparison with the pointer is because the position of the class in memory is determined during compilation. As long as the code does not change, the position of the class in memory will remain unchanged (I have said it many times).

In depth analysis of the structure of the method in objc

This breakpoint is set here becauseXXObjectIs a normal class, so it will goelseBranch allocates writable class data.

When running the code, it will judge whether the current class pointer points to each timeXXObject, so it will wait a while before entering the breakpoint.

At this time, the in the class structure is printeddataIt is found that the layout is still like this:

In depth analysis of the structure of the method in objc

After running this Code:

In depth analysis of the structure of the method in objc

Let’s print the class structure again:

(lldb) P (objc_class *) CLS // print class pointer
(objc_class *) $262 = 0x0000000100001168
(lldb) P (class_data_bits_t *) 0x0000000100001188 // add 32 offset to the class pointer to print the class_ data_ bits_ T pointer
(class_data_bits_t *) $263 = 0x0000000100001188
(lldb) P * $263 // access class_ data_ bits_ Contents of T pointer
(class_data_bits_t) $264 = (bits = 4302315312)
(lldb) p $264. Data() // get class_ rw_ t
(class_rw_t *) $265 = 0x0000000100701f30
(lldb) P * $265 // access class_ rw_ The contents of the T pointer, and it is found that its ro has been set
(class_rw_t) $266 = {
  flags = 2148007936
  version = 0
  ro = 0x00000001000010e8
  methods = {
    list_array_tt<method_t, method_list_t> = {
       = {
        list = 0x0000000000000000
        arrayAndFlag = 0
      }
    }
  }
  properties = {
    list_array_tt<property_t, property_list_t> = {
       = {
        list = 0x0000000000000000
        arrayAndFlag = 0
      }
    }
  }
  protocols = {
    list_array_tt<unsigned long, protocol_list_t> = {
       = {
        list = 0x0000000000000000
        arrayAndFlag = 0
      }
    }
  }
  firstSubclass = nil
  nextSiblingClass = nil
  demangledName = 0x0000000000000000 <no value available>
}
(lldb) p $266. Ro // get class_ ro_ T pointer
(const class_ro_t *) $267 = 0x00000001000010e8
(lldb) P * $267 // access class_ ro_ Contents of T pointer
(const class_ro_t) $268 = {
  flags = 128
  instanceStart = 8
  instanceSize = 8
  reserved = 0
  ivarLayout = 0x0000000000000000 <no value available>
  name = 0x0000000100000f7a "XXObject"
  baseMethodList = 0x00000001000010c8
  baseProtocols = 0x0000000000000000
  ivars = 0x0000000000000000
  weakIvarLayout = 0x0000000000000000 <no value available>
  baseProperties = 0x0000000000000000
}
(lldb) p $268. Basemethodlist // get the basic method list
(method_list_t *const) $269 = 0x00000001000010c8
(lldb) P $269 - > get (0) // access the first method
(method_t) $270 = {
  name = "hello"
  types = 0x0000000100000fa4 "[email protected]:8"
  imp = 0x0000000100000e90 (method`-[XXObject hello] at XXObject.m:13)
}
(lldb) P $269 - > get (1) // try to access the second method, out of bounds
error: Execution was interrupted, reason: signal SIGABRT.
The process has been returned to the state before expression evaluation.
Assertion failed: (i < count), function get, file /Users/apple/Desktop/objc-runtime/runtime/objc-runtime-new.h, line 110.
(lldb)

In depth analysis of the structure of the method in objc

The last operation can’t be intercepted

const class_ro_t *ro = (const class_ro_t *)cls->data();
class_rw_t *rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1);
rw->ro = ro;
rw->flags = RW_REALIZED|RW_REALIZING;
cls->setData(rw);

After the above code runs, the read-only pointer of the classclass_ro_tAnd read-write pointersclass_rw_tAre set correctly. But here, itsclass_rw_tSome methods and other members have null pointers, which will be displayed inmethodizeClassSet in:

In depth analysis of the structure of the method in objc

Called heremethod_array_tofattachListsMethod, willbaseMethodsAdd methods from tomethodsAfter array. We visitmethodsWill get the instance method of the current class.

Structure of method

Having said so much, now we can simply take a look at the structure of a method. Like classes and objects, a method is also a structure in memory.

struct method_t {
    SEL name;
    const char *types;
    IMP imp;
};

It contains the method name, type and implementation pointer of the methodIMP

In depth analysis of the structure of the method in objc

above-[XXObject hello]The structure of the method is as follows:

name = "hello"
types = 0x0000000100000fa4 "[email protected]:8"
imp = 0x0000000100000e90 (method`-[XXObject hello] at XXObject.m:13

There is nothing to say about the name of the method here. The type of the method is a very strange string"[email protected]:8"This is called in objcType code(type encoding), you can read this articleOfficial documentsLearn about type coding.

For the implementation of the method, lldb indicates the location of the method in the file.

Summary

When analyzing the location of the method in memory, the author has been trying to find it at the beginningread-onlystructural morphologyclass_ro_tMediumbaseMethodsThe location of the first setting (understand how the class’s methods are loaded). Try frommethodizeClassThe method keeps looking up until_obj_initMethod also did not find a method to set the read-only areabaseMethodsMethods.

And after runtime initialization,realizeClassBefore, fromclass_data_bits_tStructureclass_rw_tIt has always been wrong. This problem puzzled me at the beginning until laterrealizeClassI found that it was not at this timeclass_rw_tStructure, butclass_ro_tTo understand the reason for the mistake.

Later, it suddenly occurred to me that some methods, properties and protocols of the class were determined at compile time(baseMethodsWait until the position of members and classes in memory is determined at compile time).

  1. The location of the class in memory is determined during compilation. Modifying the code later will not change the location in memory.

  2. The methods, properties and protocols of the class are stored in the “error” location during compilation untilrealizeClassAfter execution, it was put intoclass_rw_tRead only area toclass_ro_tSo that we canclass_rw_tAdding methods does not affect the read-only structure of the class.

  3. stayclass_ro_tThe properties in cannot be changed during operation, and will be modified when adding a methodclass_rw_tMediummethodsList, notclass_ro_tMediumbaseMethods, the addition of methods will be analyzed in later articles.

reference material

Blog: Draveness
Pay attention to the warehouse and get updates in time:iOS-Source-Code-Analyze
Follow: Draveness · Github