After years of development framework, do you still know the pointer?

Time:2020-10-18

1: Background

1. Tell a story

If you play with high-level language a lot, maybe many people forget about pointer or assembly. This article will talk about pointer. Although C ා is not advocated, can you say that pointer is not important in C ා? You should know that there are a lot of pointers used in FCL libraries, such asString,Encoding,FileStreamAnd so on. For example, code:


    private unsafe static bool EqualsHelper(string strA, string strB)
    {
        fixed (char* ptr = &strA.m_firstChar)
        {
            fixed (char* ptr3 = &strB.m_firstChar)
            {
                char* ptr2 = ptr;
                char* ptr4 = ptr3;
                while (num >= 12) {...}
                while (num > 0 && *(int*)ptr2 == *(int*)ptr4) {...}
            }
        }
    }

    public unsafe Mutex(bool initiallyOwned, string name, out bool createdNew, MutexSecurity mutexSecurity)
    {
        byte* ptr = stackalloc byte[(int)checked(unchecked((ulong)(uint)securityDescriptorBinaryForm.Length))]
    }
   
    private unsafe int ReadFileNative(SafeFileHandle handle, byte[] bytes, out int hr)
    {
        fixed (byte* ptr = bytes)
        {
            num = ((!_isAsync) ? Win32Native.ReadFile(handle, ptr + offset, count, out numBytesRead, IntPtr.Zero) : Win32Native.ReadFile(handle, ptr + offset, count, IntPtr.Zero, overlapped));
        }
   }    

Yes, the beautiful world you think is actually carried by others. To say the least, the understanding and incomprehension of the pointer can’t be ignored in your study of the underlying source code. The pointer is relatively abstract. It tests your spatial imagination. Maybe many existing programmers still don’t understand it, because you lack WYSIWYG tools. I hope this article can help you Take fewer detours.

2: WinDbg helps you understand

Although the pointer is more abstract, if you use WinDbg to view the memory layout in real time, it is easy to help you understand the pattern of pointers. Let’s first understand some simple concepts of pointers.

1. &, * operators

&The addressing operator is used to obtain the memory address of a variable,*Operator, which is used to get the value pointed to by the storage address in the pointer variable. It is very abstract. Look at WinDbg.

            unsafe
            {
                int num = 10;
                int* ptr = #
                var num2 = *ptr;
                Console.WriteLine(num2);
            }

0:000> !clrstack -l
OS Thread Id: 0x41ec (0)
        Child SP               IP Call Site
0000005b1efff040 00007ffc766208e2 *** WARNING: Unable to verify checksum for ConsoleApp4.exe
ConsoleApp4.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp4\Program.cs @ 25]
    LOCALS:
        0x0000005b1efff084 = 0x000000000000000a
        0x0000005b1efff078 = 0x0000005b1efff084
        0x0000005b1efff074 = 0x000000000000000a

Watch carefullyLOCALSThree sets of key value pairs.

<1> int* ptr = &num; => 0x0000005b1efff078 = 0x0000005b1efff084

int* ptrIt’s called pointer variable. Since it is a variable, it must have its own stack address0x0000005b1efff078, and the value on this address is0x0000005b1efff084This is the stack address of num, hehe.

<2> var num2 = *ptr; => 0x0000005b1efff074 = 0x000000000000000a

*ptrThe value of PTR is used[0x0000005b1efff084]Get the value that this address points to, so it’s 10.

If I don’t understand, I’ll draw a picture, which is the most important thing~

After years of development framework, do you still know the pointer?

2. The * * operator

**It is also called secondary pointer. It refers to the pointer of the first level pointer variable address, which is a little interesting. The following procedure is as follows:ptr2That’s the pointptrA picture is worth a thousand words.


    unsafe
    {
        int num1 = 10;
        int* ptr = &num1;
        int** ptr2 = &ptr;
        var num2 = **ptr2;
    }


0:000> !clrstack -l
ConsoleApp4.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp4\Program.cs @ 26]
    LOCALS:
        0x000000305f5fef24 = 0x000000000000000a
        0x000000305f5fef18 = 0x000000305f5fef24
        0x000000305f5fef10 = 0x000000305f5fef18
        0x000000305f5fef0c = 0x000000000000000a

After years of development framework, do you still know the pointer?

3. + +, — Operator

This arithmetic operation is often used in an array or a collection of string equivalent types, such as the following code:

    fixed (int* ptr = new int[3] { 1, 2, 3 }) { }
    fixed (char* ptr2 = "abcd") { }

firstptrBy default, it points to the first address of the array allocated on the heap, which is the memory address of 1ptr++After that, it will enter the memory address of the next integer element 2, and then + + will enter the memory address of the next int, which is 3. Very simple, let me give an example:

        unsafe
        {
            fixed (int* ptr = new int[3] { 1, 2, 3 })
            {
                int* cptr = ptr;
                Console.WriteLine(((long)cptr++).ToString("x16"));
                Console.WriteLine(((long)cptr++).ToString("x16"));
                Console.WriteLine(((long)cptr++).ToString("x16"));
            }
        }

0:000> !clrstack -l
    LOCALS:
        0x00000070c15fea50 = 0x000001bcaac82da0
        0x00000070c15fea48 = 0x0000000000000000
        0x00000070c15fea40 = 0x000001bcaac82dac
        0x00000070c15fea38 = 0x000001bcaac82da8

After years of development framework, do you still know the pointer?

A picture is worth a thousand words. The values of the three memory addresses in the console are1,2,3Ha, but what should be noted here is that C ා is a managed language, and the reference type is allocated in the managed heap, so the address on the heap may change. This is because GC will recycle memory regularly. Therefore, the vs compiler needs you to fix the memory address on the heap with fixed to avoid the pressure of GC. In this case, it is0x000001bcaac82da0 - (0x000001bcaac82da8 +4)

3: Use two cases to help you understand

If the old saying is good, one word is not right, and a thousand words are useless. You have to take some examples to speak and use flexibly. OK, prepare two examples.

1. Use pointer to replace characters in string

We all know that there is a replace method in string, which is used to replace the specified character with the character you want. However, the string in C ා is immutable. If you spit on it, it will generate a new string, but the pointer is different. You can find the memory address of the replacement character first, and then assign the new character to this memory address directly, right, I’ll write a piece of code toabcgefreplace withabcdefThat is to saygReplace withd

unsafe
            {
                //Replace 'g' with'd '
                string s = "abcgef";
                char oldchar = 'g';
                char newchar = 'd';
                Console.WriteLine ($"before replacement: {s}");
                var len = s.Length;
                fixed (char* ptr = s)
                {
                    //Current pointer address
                    char* cptr = ptr;
                    for (int i = 0; i < len; i++)
                    {
                        if (*cptr == oldchar)
                        {
                            *cptr = newchar;
                            break;
                        }
                        cptr++;
                    }
                }

                Console.WriteLine ($"after replacement: {s}");
            }

----- output ------

Before replacement: abcgef
After replacement: ABCDEF
The execution is over!

See that the output is OK. Next, use WinDbg to find the reference addresses of several string objects on the thread stack. You can grab a dump file at the break.

After years of development framework, do you still know the pointer?

From the pictureLOCALSFrom the 10 variable addresses in, the last nine with addresses are all near the first address of the string:0x000001ef1ded2d48, indicating that no new string is generated.

2. Pointer and index traversal speed competition

Usually we traverse the array through the index. If we do collision test with the pointer, who do you think is faster? If I say that indexing is the encapsulation of pointers, you should know the answer. Let’s watch how fast it is???

In order to make the test results more enjoyable, I plan to traverse 100 million numbers in the following environment: net framework 4.8, release mode

static void Main(string[] args)
        {
            var nums = Enumerable.Range(0, 100000000).ToArray();

            for (int i = 0; i < 10; i++)
            {
                var watch = Stopwatch.StartNew();
                Run1(nums);
                watch.Stop();
                Console.WriteLine(watch.ElapsedMilliseconds);
            }

            Console.WriteLine("  --------------  ");

            for (int i = 0; i < 10; i++)
            {
                var watch = Stopwatch.StartNew();
                Run2(nums);
                watch.Stop();
                Console.WriteLine(watch.ElapsedMilliseconds);
            }

            Console.WriteLine ("the execution is over! "";
            Console.ReadLine();
        }

        //Traversal array
        public static void Run1(int[] nums)
        {
            unsafe
            {
                //The address of the last element of the array
                fixed (int* ptr1 = &nums[nums.Length - 1])
                {
                    //The address of the first element of the array
                    fixed (int* ptr2 = nums)
                    {
                        int* sptr = ptr2;
                        int* eptr = ptr1;
                        while (sptr <= eptr)
                        {
                            int num = *sptr;
                            sptr++;
                        }
                    }
                }
            }
        }

        public static void Run2(int[] nums)
        {
            for (int i = 0; i < nums.Length; i++)
            {
                int num = nums[i];
            }
        }

After years of development framework, do you still know the pointer?

There is a picture of the truth ha, go directly to the pointer than the array subscript is nearly twice as fast.

4: Summary

I hope this article can give you a friendly reminder running on the frame. Don’t forget the pointer. The pointer that is not used by others is widely used in the bottom frame~