Modern programmers write code, no one dares to say that they have never used generics. This generic template t can be replaced by any type you want. It’s really magic and magical. Many people are used to it. But it’s so interesting how the bottom layer of generic t helps you realize it. I’ll try to share this article, not all of them are right 。。。
1: Before generics
The current NETCORE 3.1 and the latest. Netframework 8 do not have the ArrayList that was criticized at the beginning, but it happened that this thing had to be said, because it decided the C ᦇ team to change their ways, abandon the past and go back to the road. The last paragraph of ArrayList case code.
public class ArrayList
{
private object[] items;
private int index = 0;
public ArrayList()
{
items = new object[10];
}
public void Add(object item)
{
items[index++] = item;
}
}
In order to ensure that various types of eg: int, double, and class can be inserted into the add code, a unique trick is to use the ancestor class object to receive. This introduces two major problems: packing and unboxing and type security.
1. Packing and unpacking
This is easy to understand, because you use the ancestor class, so when youAdd
If the value type is inserted, there will be boxing operation, such as the following code:
ArrayList arrayList = new ArrayList();
arrayList.Add(3);
<1> Take up more space
I’m going to take a look at this problem with WinDbg. I believe you all know that an int type takes 4 bytes. How many bytes are boxed onto the heap? Be curious.
The original code and IL code are as follows:
public static void Main(string[] args)
{
var num = 10;
var obj = (object)num;
Console.Read();
}
IL_0000: nop
IL_0001: ldc.i4.s 10
IL_0003: stloc.0
IL_0004: ldloc.0
IL_0005: box [mscorlib]System.Int32
IL_000a: stloc.1
IL_000b: call int32 [mscorlib]System.Console::Read()
IL_0010: pop
IL_0011: ret
You can see IL clearly_ There is a box instruction in 0005. There is no problem with packing. Then grab the dump file.
~0s -> !clrstack -l -> !do 0x0000018300002d48
0:000> ~0s
ntdll!ZwReadFile+0x14:
00007ff9`fc7baa64 c3 ret
0:000> !clrstack -l
OS Thread Id: 0xfc (0)
Child SP IP Call Site
0000002c397fedf0 00007ff985c808f3 ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 28]
LOCALS:
0x0000002c397fee2c = 0x000000000000000a
0x0000002c397fee20 = 0x0000018300002d48
0000002c397ff038 00007ff9e51b6c93 [GCFrame: 0000002c397ff038]
0:000> !do 0x0000018300002d48
Name: System.Int32
MethodTable: 00007ff9e33285a0
EEClass: 00007ff9e34958a8
Size: 24(0x18) bytes
File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007ff9e33285a0 40005a0 8 System.Int32 1 instance 10 m_value
Line 5 to the bottomSize: 24(0x18) bytes
As you can see, it’s 24 bytes. Why 24 bytes,8 (synchronous block pointer) + 8 (method table pointer) + 4 (object size) = 20
But because it is x64 bits, the memory is aligned by 8, that is to say, it is calculated by a multiple of 8, so the occupancy is8+8+8 =24
Bytes, originally only 4 bytes in size, because the boxing has been exploded to 24 bytes. If it is a packing of 10000 value types, is the space occupation terrible?
<2> It takes a lot of manpower and machine cost to pack from stack to stack, transport to after-sale and harmless treatment
2. Unsafe type
It’s very simple. Because it’s the ancestor type object, programmers can’t avoid using messy types. Of course, this may be unintentional, but the compiler can’t avoid it. The code is as follows:
ArrayList arrayList = new ArrayList();
arrayList.Add(3);
arrayList.Add(new Action<int>((num) => { }));
arrayList.Add(new object());
Faced with these two embarrassing problems, the C ා team decided to redesign a type to achieve a certain lifetime, which led to generics.
2: The emergence of generics
1. Savior
First of all, it is clear that generics are created to solve these two problems, and you can provide them at the bottomList<T>
Use inList<int>
,List<double>
。。。 This article focuses on the underlying implementation principles of this technology.
public static void Main(string[] args)
{
List<double> list1 = new List<double>();
List<string> list3 = new List<string>();
...
}
3: Research on the principle of generics
The exploration of this problem is actuallyList<T> -> List<int>
In contrast to Java, its generic implementation is actually replaced by object at the bottom. C ා certainly doesn’t do this, otherwise there will be no article. To know which stage is replaced, you should at least know several stages of C ා code compilation. For the convenience of understanding, I will draw a picture.
As you can see, the process is either replaced in MSIL or replaced in JIT compilation…
public static void Main(string[] args)
{
List<double> list1 = new List<double>();
List<int> list2 = new List<int>();
List<string> list3 = new List<string>();
List<int[]> list4 = new List<int[]>();
Console.ReadLine();
}
1. Explore in the first stage
Because the first phase is MSIL code, you can use ilspy to look at the intermediate code.
IL_0000: nop
IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<float64>::.ctor()
IL_0006: stloc.0
IL_0007: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
IL_000c: stloc.1
IL_000d: newobj instance void class [mscorlib]System.Collections.Generic.List`1<string>::.ctor()
IL_0012: stloc.2
IL_0013: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32[]>::.ctor()
IL_0018: stloc.3
IL_0019: call string [mscorlib]System.Console::ReadLine()
IL_001e: pop
IL_001f: ret
.class public auto ansi serializable beforefieldinit System.Collections.Generic.List`1<T>
extends System.Object
implements class System.Collections.Generic.IList`1<!T>,
class System.Collections.Generic.ICollection`1<!T>,
class System.Collections.Generic.IEnumerable`1<!T>,
System.Collections.IEnumerable,
System.Collections.IList,
System.Collections.ICollection,
class System.Collections.Generic.IReadOnlyList`1<!T>,
class System.Collections.Generic.IReadOnlyCollection`1<!T>
As can be seen from the above IL code, the final class definition is stillSystem.Collections.Generic.List1\<T>
, indicating that the replacement of T > int is not implemented in the intermediate code stage.
2. Explore in the second stage
It is not difficult to say that you want to see the JIT compiled code. In fact, there is a method table pointer on the head of each object, and this pointer points to the method table. There are all the final generated methods of this type in the method table. If it is not easy to understand, I will draw a picture.
! dumpheap – stat finds four list objects on the managed heap.
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
00007ff9e3314320 1 32 Microsoft.Win32.SafeHandles.SafeViewOfFileHandle
00007ff9e339b4b8 1 40 System.Collections.Generic.List`1[[System.Double, mscorlib]]
00007ff9e333a068 1 40 System.Collections.Generic.List`1[[System.Int32, mscorlib]]
00007ff9e3330d58 1 40 System.Collections.Generic.List`1[[System.String, mscorlib]]
00007ff9e3314a58 1 40 System.IO.Stream+NullStream
00007ff9e3314510 1 40 Microsoft.Win32.Win32Native+InputRecord
00007ff9e3314218 1 40 System.Text.InternalEncoderBestFitFallback
00007ff985b442c0 1 40 System.Collections.Generic.List`1[[System.Int32[], mscorlib]]
00007ff9e338fd28 1 48 System.Text.DBCSCodePageEncoding+DBCSDecoder
00007ff9e3325ef0 1 48 System.SharedStatics
As you can see, four list objects have been found in the managed heap, and now I’ll pick the simplest oneSystem.Collections.Generic.List1[[System.Int32, mscorlib]]
00007ff9e33a068 is the address of the method table.
!dumpmt -md 00007ff9e333a068
0:000> !dumpmt -md 00007ff9e333a068
EEClass: 00007ff9e349b008
Module: 00007ff9e3301000
Name: System.Collections.Generic.List`1[[System.Int32, mscorlib]]
mdToken: 00000000020004af
File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
BaseSize: 0x28
ComponentSize: 0x0
Slots in VTable: 77
Number of IFaces in IFaceMap: 8
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString()
00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object)
00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode()
00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize()
00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
00007ff9e4202dc0 00007ff9e34dc7f8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Insert(Int32, Int32)
There are too many methods in the method table above. I made some deletion. It can be seen clearly that the add method has already accepted the data of type . This shows that after JIT compilation, the replacement of T > int is finally implemented, and then theList<double>
Type it out and have a look.
0:000> !dumpmt -md 00007ff9e339b4b8
MethodDesc Table
Entry MethodDesc JIT Name
00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString()
00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object)
00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode()
00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize()
00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)
00007ff9e3867a00 00007ff9e34e4280 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Insert(Int32, Double)
The above are all value types. Next, what if t is a reference type?
0:000> !dumpmt -md 00007ff9e3330d58
MethodDesc Table
Entry MethodDesc JIT Name
00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
0:000> !dumpmt -md 00007ff985b442c0
MethodDesc Table
Entry MethodDesc JIT Name
00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
You can see that when it isList<int[]>
andList<string>
When JIT is usedSystem.__Canon
As an alternative to this type, it is possible that other people are photography lovers. Why use it__Canon
Instead of reference type, this is because it wants to share all the methods that can share the code area to save space and memory. If you don’t believe it, you can see that their entry columns all have the same memory address: 00007ff9e3890060. It is such an assembly when printed out.
0:000> !u 00007ff9e3890060
preJIT generated code
System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
Begin 00007ff9e3890060, size 4a
>>> 00007ff9`e3890060 57 push rdi
00007ff9`e3890061 56 push rsi
00007ff9`e3890062 4883ec28 sub rsp,28h
00007ff9`e3890066 488bf1 mov rsi,rcx
00007ff9`e3890069 488bfa mov rdi,rdx
00007ff9`e389006c 8b4e18 mov ecx,dword ptr [rsi+18h]
00007ff9`e389006f 488b5608 mov rdx,qword ptr [rsi+8]
00007ff9`e3890073 3b4a08 cmp ecx,dword ptr [rdx+8]
00007ff9`e3890076 7422 je mscorlib_ni+0x59009a (00007ff9`e389009a)
00007ff9`e3890078 488b4e08 mov rcx,qword ptr [rsi+8]
00007ff9`e389007c 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e389007f 448d4201 lea r8d,[rdx+1]
00007ff9`e3890083 44894618 mov dword ptr [rsi+18h],r8d
00007ff9`e3890087 4c8bc7 mov r8,rdi
00007ff9`e389008a ff152088faff call qword ptr [mscorlib_ni+0x5388b0 (00007ff9`e38388b0)] (JitHelp: CORINFO_HELP_ARRADDR_ST)
00007ff9`e3890090 ff461c inc dword ptr [rsi+1Ch]
00007ff9`e3890093 4883c428 add rsp,28h
00007ff9`e3890097 5e pop rsi
00007ff9`e3890098 5f pop rdi
00007ff9`e3890099 c3 ret
00007ff9`e389009a 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e389009d ffc2 inc edx
00007ff9`e389009f 488bce mov rcx,rsi
00007ff9`e38900a2 90 nop
00007ff9`e38900a3 e8c877feff call mscorlib_ni+0x577870 (00007ff9`e3877870) (System.Collections.Generic.List`1[[System.__Canon, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e38900a8 ebce jmp mscorlib_ni+0x590078 (00007ff9`e3890078)
And then look backList<int>
andList<double>
From the entry column, it is not an addressList<int>
andList<double>
They are two completely different add methods. If you understand the assembly, you can have a look at it yourself…
MethodDesc Table
Entry MethodDesc JIT Name
00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)
0:000> !u 00007ff9e38a3650
preJIT generated code
System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
Begin 00007ff9e38a3650, size 50
>>> 00007ff9`e38a3650 57 push rdi
00007ff9`e38a3651 56 push rsi
00007ff9`e38a3652 4883ec28 sub rsp,28h
00007ff9`e38a3656 488bf1 mov rsi,rcx
00007ff9`e38a3659 8bfa mov edi,edx
00007ff9`e38a365b 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e38a365e 488b4e08 mov rcx,qword ptr [rsi+8]
00007ff9`e38a3662 3b5108 cmp edx,dword ptr [rcx+8]
00007ff9`e38a3665 7423 je mscorlib_ni+0x5a368a (00007ff9`e38a368a)
00007ff9`e38a3667 488b5608 mov rdx,qword ptr [rsi+8]
00007ff9`e38a366b 8b4e18 mov ecx,dword ptr [rsi+18h]
00007ff9`e38a366e 8d4101 lea eax,[rcx+1]
00007ff9`e38a3671 894618 mov dword ptr [rsi+18h],eax
00007ff9`e38a3674 3b4a08 cmp ecx,dword ptr [rdx+8]
00007ff9`e38a3677 7321 jae mscorlib_ni+0x5a369a (00007ff9`e38a369a)
00007ff9`e38a3679 4863c9 movsxd rcx,ecx
00007ff9`e38a367c 897c8a10 mov dword ptr [rdx+rcx*4+10h],edi
00007ff9`e38a3680 ff461c inc dword ptr [rsi+1Ch]
00007ff9`e38a3683 4883c428 add rsp,28h
00007ff9`e38a3687 5e pop rsi
00007ff9`e38a3688 5f pop rdi
00007ff9`e38a3689 c3 ret
00007ff9`e38a368a 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e38a368d ffc2 inc edx
00007ff9`e38a368f 488bce mov rcx,rsi
00007ff9`e38a3692 90 nop
00007ff9`e38a3693 e8a8e60700 call mscorlib_ni+0x621d40 (00007ff9`e3921d40) (System.Collections.Generic.List`1[[System.Int32, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e38a3698 ebcd jmp mscorlib_ni+0x5a3667 (00007ff9`e38a3667)
00007ff9`e38a369a e8bf60f9ff call mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni)
00007ff9`e38a369f cc int 3
0:000> !u 00007ff9e4428730
preJIT generated code
System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double)
Begin 00007ff9e4428730, size 5a
>>> 00007ff9`e4428730 56 push rsi
00007ff9`e4428731 4883ec20 sub rsp,20h
00007ff9`e4428735 488bf1 mov rsi,rcx
00007ff9`e4428738 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e442873b 488b4e08 mov rcx,qword ptr [rsi+8]
00007ff9`e442873f 3b5108 cmp edx,dword ptr [rcx+8]
00007ff9`e4428742 7424 je mscorlib_ni+0x1128768 (00007ff9`e4428768)
00007ff9`e4428744 488b5608 mov rdx,qword ptr [rsi+8]
00007ff9`e4428748 8b4e18 mov ecx,dword ptr [rsi+18h]
00007ff9`e442874b 8d4101 lea eax,[rcx+1]
00007ff9`e442874e 894618 mov dword ptr [rsi+18h],eax
00007ff9`e4428751 3b4a08 cmp ecx,dword ptr [rdx+8]
00007ff9`e4428754 732e jae mscorlib_ni+0x1128784 (00007ff9`e4428784)
00007ff9`e4428756 4863c9 movsxd rcx,ecx
00007ff9`e4428759 f20f114cca10 movsd mmword ptr [rdx+rcx*8+10h],xmm1
00007ff9`e442875f ff461c inc dword ptr [rsi+1Ch]
00007ff9`e4428762 4883c420 add rsp,20h
00007ff9`e4428766 5e pop rsi
00007ff9`e4428767 c3 ret
00007ff9`e4428768 f20f114c2438 movsd mmword ptr [rsp+38h],xmm1
00007ff9`e442876e 8b5618 mov edx,dword ptr [rsi+18h]
00007ff9`e4428771 ffc2 inc edx
00007ff9`e4428773 488bce mov rcx,rsi
00007ff9`e4428776 90 nop
00007ff9`e4428777 e854fbffff call mscorlib_ni+0x11282d0 (00007ff9`e44282d0) (System.Collections.Generic.List`1[[System.Double, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5)
00007ff9`e442877c f20f104c2438 movsd xmm1,mmword ptr [rsp+38h]
00007ff9`e4428782 ebc0 jmp mscorlib_ni+0x1128744 (00007ff9`e4428744)
00007ff9`e4428784 e8d50f41ff call mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni)
00007ff9`e4428789 cc int 3
Maybe you are a little confused. Let me draw a picture.
4: Summary
The true replacement of generic t is only implemented at JIT compile time. FourList<T>
It will generate four class objects with corresponding specific types, so there is no problem of unboxing and boxing, and the visual studio compiler tool will help us to constrain them in advance.
It’s late at night. Let’s have a rest! I hope this article will help you.