How much do you know about generics after all these years of using them?

Time:2020-11-30

Modern programmers write code, no one dares to say that they have never used generics. This generic template t can be replaced by any type you want. It’s really magic and magical. Many people are used to it. But it’s so interesting how the bottom layer of generic t helps you realize it. I’ll try to share this article, not all of them are right 。。。

1: Before generics

The current NETCORE 3.1 and the latest. Netframework 8 do not have the ArrayList that was criticized at the beginning, but it happened that this thing had to be said, because it decided the C ᦇ team to change their ways, abandon the past and go back to the road. The last paragraph of ArrayList case code.

    public class ArrayList
    {
        private object[] items;

        private int index = 0;

        public ArrayList()
        {
            items = new object[10];
        }

        public void Add(object item)
        {
            items[index++] = item;
        }
    }

In order to ensure that various types of eg: int, double, and class can be inserted into the add code, a unique trick is to use the ancestor class object to receive. This introduces two major problems: packing and unboxing and type security.

1. Packing and unpacking

This is easy to understand, because you use the ancestor class, so when youAddIf the value type is inserted, there will be boxing operation, such as the following code:

            ArrayList arrayList = new ArrayList();
            arrayList.Add(3);

<1> Take up more space

I’m going to take a look at this problem with WinDbg. I believe you all know that an int type takes 4 bytes. How many bytes are boxed onto the heap? Be curious.

The original code and IL code are as follows:

        public static void Main(string[] args)
        {
            var num = 10;
            var obj = (object)num;
            Console.Read();
        }

    IL_0000: nop
    IL_0001: ldc.i4.s 10
    IL_0003: stloc.0
    IL_0004: ldloc.0
    IL_0005: box [mscorlib]System.Int32
    IL_000a: stloc.1
    IL_000b: call int32 [mscorlib]System.Console::Read()
    IL_0010: pop
    IL_0011: ret

You can see IL clearly_ There is a box instruction in 0005. There is no problem with packing. Then grab the dump file.

~0s -> !clrstack -l -> !do 0x0000018300002d48


0:000> ~0s
ntdll!ZwReadFile+0x14:
00007ff9`fc7baa64 c3              ret
0:000> !clrstack -l
OS Thread Id: 0xfc (0)
        Child SP               IP Call Site
0000002c397fedf0 00007ff985c808f3 ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 28]
    LOCALS:
        0x0000002c397fee2c = 0x000000000000000a
        0x0000002c397fee20 = 0x0000018300002d48

0000002c397ff038 00007ff9e51b6c93 [GCFrame: 0000002c397ff038] 
0:000> !do 0x0000018300002d48
Name:        System.Int32
MethodTable: 00007ff9e33285a0
EEClass:     00007ff9e34958a8
Size:        24(0x18) bytes
File:        C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff9e33285a0  40005a0        8         System.Int32  1 instance               10 m_value

Line 5 to the bottomSize: 24(0x18) bytesAs you can see, it’s 24 bytes. Why 24 bytes,8 (synchronous block pointer) + 8 (method table pointer) + 4 (object size) = 20But because it is x64 bits, the memory is aligned by 8, that is to say, it is calculated by a multiple of 8, so the occupancy is8+8+8 =24Bytes, originally only 4 bytes in size, because the boxing has been exploded to 24 bytes. If it is a packing of 10000 value types, is the space occupation terrible?

<2> It takes a lot of manpower and machine cost to pack from stack to stack, transport to after-sale and harmless treatment

2. Unsafe type

It’s very simple. Because it’s the ancestor type object, programmers can’t avoid using messy types. Of course, this may be unintentional, but the compiler can’t avoid it. The code is as follows:


            ArrayList arrayList = new ArrayList();
            arrayList.Add(3);
            arrayList.Add(new Action<int>((num) => { }));
            arrayList.Add(new object());

Faced with these two embarrassing problems, the C ා team decided to redesign a type to achieve a certain lifetime, which led to generics.

2: The emergence of generics

1. Savior

First of all, it is clear that generics are created to solve these two problems, and you can provide them at the bottomList<T>Use inList<int>,List<double>。。。 This article focuses on the underlying implementation principles of this technology.


        public static void Main(string[] args)
        {
            List<double> list1 = new List<double>();
            List<string> list3 = new List<string>();
            ...
        }

3: Research on the principle of generics

The exploration of this problem is actuallyList<T> -> List<int>In contrast to Java, its generic implementation is actually replaced by object at the bottom. C ා certainly doesn’t do this, otherwise there will be no article. To know which stage is replaced, you should at least know several stages of C ා code compilation. For the convenience of understanding, I will draw a picture.

How much do you know about generics after all these years of using them?

As you can see, the process is either replaced in MSIL or replaced in JIT compilation…


        public static void Main(string[] args)
        {
            List<double> list1 = new List<double>();
            List<int> list2 = new List<int>();
            List<string> list3 = new List<string>();
            List<int[]> list4 = new List<int[]>();

            Console.ReadLine();
        }

1. Explore in the first stage

Because the first phase is MSIL code, you can use ilspy to look at the intermediate code.

        IL_0000: nop
        IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<float64>::.ctor()
        IL_0006: stloc.0
        IL_0007: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
        IL_000c: stloc.1
        IL_000d: newobj instance void class [mscorlib]System.Collections.Generic.List`1<string>::.ctor()
        IL_0012: stloc.2
        IL_0013: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32[]>::.ctor()
        IL_0018: stloc.3
        IL_0019: call string [mscorlib]System.Console::ReadLine()
        IL_001e: pop
        IL_001f: ret

.class public auto ansi serializable beforefieldinit System.Collections.Generic.List`1<T>
    extends System.Object
    implements class System.Collections.Generic.IList`1<!T>,
               class System.Collections.Generic.ICollection`1<!T>,
               class System.Collections.Generic.IEnumerable`1<!T>,
               System.Collections.IEnumerable,
               System.Collections.IList,
               System.Collections.ICollection,
               class System.Collections.Generic.IReadOnlyList`1<!T>,
               class System.Collections.Generic.IReadOnlyCollection`1<!T>

As can be seen from the above IL code, the final class definition is stillSystem.Collections.Generic.List1\<T>, indicating that the replacement of T > int is not implemented in the intermediate code stage.

2. Explore in the second stage

It is not difficult to say that you want to see the JIT compiled code. In fact, there is a method table pointer on the head of each object, and this pointer points to the method table. There are all the final generated methods of this type in the method table. If it is not easy to understand, I will draw a picture.

How much do you know about generics after all these years of using them?

! dumpheap – stat finds four list objects on the managed heap.


0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ff9e3314320        1           32 Microsoft.Win32.SafeHandles.SafeViewOfFileHandle
00007ff9e339b4b8        1           40 System.Collections.Generic.List`1[[System.Double, mscorlib]]
00007ff9e333a068        1           40 System.Collections.Generic.List`1[[System.Int32, mscorlib]]
00007ff9e3330d58        1           40 System.Collections.Generic.List`1[[System.String, mscorlib]]
00007ff9e3314a58        1           40 System.IO.Stream+NullStream
00007ff9e3314510        1           40 Microsoft.Win32.Win32Native+InputRecord
00007ff9e3314218        1           40 System.Text.InternalEncoderBestFitFallback
00007ff985b442c0        1           40 System.Collections.Generic.List`1[[System.Int32[], mscorlib]]
00007ff9e338fd28        1           48 System.Text.DBCSCodePageEncoding+DBCSDecoder
00007ff9e3325ef0        1           48 System.SharedStatics

As you can see, four list objects have been found in the managed heap, and now I’ll pick the simplest oneSystem.Collections.Generic.List1[[System.Int32, mscorlib]]00007ff9e33a068 is the address of the method table.

!dumpmt -md 00007ff9e333a068


0:000> !dumpmt -md 00007ff9e333a068
EEClass:         00007ff9e349b008
Module:          00007ff9e3301000
Name:            System.Collections.Generic.List`1[[System.Int32, mscorlib]]
mdToken:         00000000020004af
File:            C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
BaseSize:        0x28
ComponentSize:   0x0
Slots in VTable: 77
Number of IFaces in IFaceMap: 8
--------------------------------------
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString()
00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object)
00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode()
00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize()
00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
00007ff9e4202dc0 00007ff9e34dc7f8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Insert(Int32, Int32)

There are too many methods in the method table above. I made some deletion. It can be seen clearly that the add method has already accepted the data of type . This shows that after JIT compilation, the replacement of T > int is finally implemented, and then theList<double>Type it out and have a look.


0:000> !dumpmt -md 00007ff9e339b4b8
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString()
00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object)
00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode()
00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize()
00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)
00007ff9e3867a00 00007ff9e34e4280 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Insert(Int32, Double)

The above are all value types. Next, what if t is a reference type?


0:000> !dumpmt -md 00007ff9e3330d58
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)

0:000> !dumpmt -md 00007ff985b442c0
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)

You can see that when it isList<int[]>andList<string>When JIT is usedSystem.__CanonAs an alternative to this type, it is possible that other people are photography lovers. Why use it__CanonInstead of reference type, this is because it wants to share all the methods that can share the code area to save space and memory. If you don’t believe it, you can see that their entry columns all have the same memory address: 00007ff9e3890060. It is such an assembly when printed out.


0:000> !u 00007ff9e3890060
preJIT generated code
System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
Begin 00007ff9e3890060, size 4a
>>> 00007ff9`e3890060 57              push    rdi
00007ff9`e3890061 56              push    rsi
00007ff9`e3890062 4883ec28        sub     rsp,28h
00007ff9`e3890066 488bf1          mov     rsi,rcx
00007ff9`e3890069 488bfa          mov     rdi,rdx
00007ff9`e389006c 8b4e18          mov     ecx,dword ptr [rsi+18h]
00007ff9`e389006f 488b5608        mov     rdx,qword ptr [rsi+8]
00007ff9`e3890073 3b4a08          cmp     ecx,dword ptr [rdx+8]
00007ff9`e3890076 7422            je      mscorlib_ni+0x59009a (00007ff9`e389009a)
00007ff9`e3890078 488b4e08        mov     rcx,qword ptr [rsi+8]
00007ff9`e389007c 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e389007f 448d4201        lea     r8d,[rdx+1]
00007ff9`e3890083 44894618        mov     dword ptr [rsi+18h],r8d
00007ff9`e3890087 4c8bc7          mov     r8,rdi
00007ff9`e389008a ff152088faff    call    qword ptr [mscorlib_ni+0x5388b0 (00007ff9`e38388b0)] (JitHelp: CORINFO_HELP_ARRADDR_ST)
00007ff9`e3890090 ff461c          inc     dword ptr [rsi+1Ch]
00007ff9`e3890093 4883c428        add     rsp,28h
00007ff9`e3890097 5e              pop     rsi
00007ff9`e3890098 5f              pop     rdi
00007ff9`e3890099 c3              ret
00007ff9`e389009a 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e389009d ffc2            inc     edx
00007ff9`e389009f 488bce          mov     rcx,rsi
00007ff9`e38900a2 90              nop
00007ff9`e38900a3 e8c877feff      call    mscorlib_ni+0x577870 (00007ff9`e3877870) (System.Collections.Generic.List`1[[System.__Canon, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e38900a8 ebce            jmp     mscorlib_ni+0x590078 (00007ff9`e3890078)

And then look backList<int>andList<double>From the entry column, it is not an addressList<int>andList<double>They are two completely different add methods. If you understand the assembly, you can have a look at it yourself…

MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)

0:000> !u 00007ff9e38a3650
preJIT generated code
System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
Begin 00007ff9e38a3650, size 50
>>> 00007ff9`e38a3650 57              push    rdi
00007ff9`e38a3651 56              push    rsi
00007ff9`e38a3652 4883ec28        sub     rsp,28h
00007ff9`e38a3656 488bf1          mov     rsi,rcx
00007ff9`e38a3659 8bfa            mov     edi,edx
00007ff9`e38a365b 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e38a365e 488b4e08        mov     rcx,qword ptr [rsi+8]
00007ff9`e38a3662 3b5108          cmp     edx,dword ptr [rcx+8]
00007ff9`e38a3665 7423            je      mscorlib_ni+0x5a368a (00007ff9`e38a368a)
00007ff9`e38a3667 488b5608        mov     rdx,qword ptr [rsi+8]
00007ff9`e38a366b 8b4e18          mov     ecx,dword ptr [rsi+18h]
00007ff9`e38a366e 8d4101          lea     eax,[rcx+1]
00007ff9`e38a3671 894618          mov     dword ptr [rsi+18h],eax
00007ff9`e38a3674 3b4a08          cmp     ecx,dword ptr [rdx+8]
00007ff9`e38a3677 7321            jae     mscorlib_ni+0x5a369a (00007ff9`e38a369a)
00007ff9`e38a3679 4863c9          movsxd  rcx,ecx
00007ff9`e38a367c 897c8a10        mov     dword ptr [rdx+rcx*4+10h],edi
00007ff9`e38a3680 ff461c          inc     dword ptr [rsi+1Ch]
00007ff9`e38a3683 4883c428        add     rsp,28h
00007ff9`e38a3687 5e              pop     rsi
00007ff9`e38a3688 5f              pop     rdi
00007ff9`e38a3689 c3              ret
00007ff9`e38a368a 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e38a368d ffc2            inc     edx
00007ff9`e38a368f 488bce          mov     rcx,rsi
00007ff9`e38a3692 90              nop
00007ff9`e38a3693 e8a8e60700      call    mscorlib_ni+0x621d40 (00007ff9`e3921d40) (System.Collections.Generic.List`1[[System.Int32, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e38a3698 ebcd            jmp     mscorlib_ni+0x5a3667 (00007ff9`e38a3667)
00007ff9`e38a369a e8bf60f9ff      call    mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni)
00007ff9`e38a369f cc              int     3


0:000> !u 00007ff9e4428730
preJIT generated code
System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)
Begin 00007ff9e4428730, size 5a
>>> 00007ff9`e4428730 56              push    rsi
00007ff9`e4428731 4883ec20        sub     rsp,20h
00007ff9`e4428735 488bf1          mov     rsi,rcx
00007ff9`e4428738 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e442873b 488b4e08        mov     rcx,qword ptr [rsi+8]
00007ff9`e442873f 3b5108          cmp     edx,dword ptr [rcx+8]
00007ff9`e4428742 7424            je      mscorlib_ni+0x1128768 (00007ff9`e4428768)
00007ff9`e4428744 488b5608        mov     rdx,qword ptr [rsi+8]
00007ff9`e4428748 8b4e18          mov     ecx,dword ptr [rsi+18h]
00007ff9`e442874b 8d4101          lea     eax,[rcx+1]
00007ff9`e442874e 894618          mov     dword ptr [rsi+18h],eax
00007ff9`e4428751 3b4a08          cmp     ecx,dword ptr [rdx+8]
00007ff9`e4428754 732e            jae     mscorlib_ni+0x1128784 (00007ff9`e4428784)
00007ff9`e4428756 4863c9          movsxd  rcx,ecx
00007ff9`e4428759 f20f114cca10    movsd   mmword ptr [rdx+rcx*8+10h],xmm1
00007ff9`e442875f ff461c          inc     dword ptr [rsi+1Ch]
00007ff9`e4428762 4883c420        add     rsp,20h
00007ff9`e4428766 5e              pop     rsi
00007ff9`e4428767 c3              ret
00007ff9`e4428768 f20f114c2438    movsd   mmword ptr [rsp+38h],xmm1
00007ff9`e442876e 8b5618          mov     edx,dword ptr [rsi+18h]
00007ff9`e4428771 ffc2            inc     edx
00007ff9`e4428773 488bce          mov     rcx,rsi
00007ff9`e4428776 90              nop
00007ff9`e4428777 e854fbffff      call    mscorlib_ni+0x11282d0 (00007ff9`e44282d0) (System.Collections.Generic.List`1[[System.Double, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e442877c f20f104c2438    movsd   xmm1,mmword ptr [rsp+38h]
00007ff9`e4428782 ebc0            jmp     mscorlib_ni+0x1128744 (00007ff9`e4428744)
00007ff9`e4428784 e8d50f41ff      call    mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni)
00007ff9`e4428789 cc              int     3

Maybe you are a little confused. Let me draw a picture.

How much do you know about generics after all these years of using them?

4: Summary

The true replacement of generic t is only implemented at JIT compile time. FourList<T>It will generate four class objects with corresponding specific types, so there is no problem of unboxing and boxing, and the visual studio compiler tool will help us to constrain them in advance.

It’s late at night. Let’s have a rest! I hope this article will help you.