String and [] byte conversion of golang

Time:2021-12-2

Compared with C language, golang is a type safe language. But the price of security is the compromise of performance.

Let’s take a look at the “secret” that golang doesn’t want us to see — the underlying data of string.

Through the reflect package, we can know that at the bottom of golang, string and slice are actually structs:


type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}
type StringHeader struct {
    Data uintptr
    Len  int
}

Where data is a pointer to the actual data address, and Len represents the data length.

However, in the process of string and [] byte conversion, what does golang quietly do for us to achieve security?

In the golang language specification, string data is forbidden to be modified, and attempts to obtain string and slice data pointer addresses through & s [0], & B [0] cannot be compiled.

Next, let’s take a look at the “secret” behind golang through golang’s “black technology”


//return GoString's buffer slice(enable modify string)
func StringBytes(s string) Bytes {
    return *(*Bytes)(unsafe.Pointer(&s))
}
// convert b to string without copy
func BytesString(b []byte) String {
    return *(*String)(unsafe.Pointer(&b))
}
// returns &s[0], which is not allowed in go
func StringPointer(s string) unsafe.Pointer {
    p := (*reflect.StringHeader)(unsafe.Pointer(&s))
    return unsafe.Pointer(p.Data)
}
// returns &b[0], which is not allowed in go
func BytesPointer(b []byte) unsafe.Pointer {
    p := (*reflect.SliceHeader)(unsafe.Pointer(&b))
    return unsafe.Pointer(p.Data)
}

The magic of the above four functions is that they get the data header address through unsafe.pointer and reflect.xxxheader, and realize the direct conversion of string and [] byte (these operations are prohibited at the language level).

Let’s test the secrets of the underlying language through these “black technologies”:


func TestPointer(t *testing.T) {
    s := []string{
        "",
        "",
        "hello",
        "hello",
        fmt.Sprintf(""),
        fmt.Sprintf(""),
        fmt.Sprintf("hello"),
        fmt.Sprintf("hello"),
    }
    fmt.Println("String to bytes:")
    for i, v := range s {
        b := unsafe.StringBytes(v)
        b2 := []byte(v)
        if b.Writeable() {
            b[0] = 'x'
        }
        fmt.Printf("%d\ts=%5s\tptr(v)=%-12v\tptr(StringBytes(v)=%-12v\tptr([]byte(v)=%-12v\n",
            i, v, unsafe.StringPointer(v), b.Pointer(), unsafe.BytesPointer(b2))
    }
    b := [][]byte{
        []byte{},
        []byte{'h', 'e', 'l', 'l', 'o'},
    }
    fmt.Println("Bytes to string:")
    for i, v := range b {
        s1 := unsafe.BytesString(v)
        s2 := string(v)
        fmt.Printf("%d\ts=%5s\tptr(v)=%-12v\tptr(StringBytes(v)=%-12v\tptr(string(v)=%-12v\n",
            i, s1, unsafe.BytesPointer(v), s1.Pointer(), unsafe.StringPointer(s2))
    }
}
const N = 3000000
func Benchmark_Normal(b *testing.B) {
    for i := 1; i < N; i++ {
        s := fmt.Sprintf("12345678901234567890123456789012345678901234567890")
        bb := []byte(s)
        bb[0] = 'x'
        s = string(bb)
        s = s
    }
}
func Benchmark_Direct(b *testing.B) {
    for i := 1; i < N; i++ {
        s := fmt.Sprintf("12345678901234567890123456789012345678901234567890")
        bb := unsafe.StringBytes(s)
        bb[0] = 'x'
        s = s
    }
}
//test result
//String to bytes:
//0 s=      ptr(v)=0x51bd70     ptr(StringBytes(v)=0x51bd70     ptr([]byte(v)=0xc042021c58
//1 s=      ptr(v)=0x51bd70     ptr(StringBytes(v)=0x51bd70     ptr([]byte(v)=0xc042021c58
//2 s=hello ptr(v)=0x51c2fa     ptr(StringBytes(v)=0x51c2fa     ptr([]byte(v)=0xc042021c58
//3 s=hello ptr(v)=0x51c2fa     ptr(StringBytes(v)=0x51c2fa     ptr([]byte(v)=0xc042021c58
//4 s=      ptr(v)=<nil>        ptr(StringBytes(v)=<nil>        ptr([]byte(v)=0xc042021c58
//5 s=      ptr(v)=<nil>        ptr(StringBytes(v)=<nil>        ptr([]byte(v)=0xc042021c58
//6 s=xello ptr(v)=0xc0420444b5 ptr(StringBytes(v)=0xc0420444b5 ptr([]byte(v)=0xc042021c58
//7 s=xello ptr(v)=0xc0420444ba ptr(StringBytes(v)=0xc0420444ba ptr([]byte(v)=0xc042021c58
//Bytes to string:
//0 s=      ptr(v)=0x5c38b8     ptr(StringBytes(v)=0x5c38b8     ptr(string(v)=<nil>
//1 s=hello ptr(v)=0xc0420445e0 ptr(StringBytes(v)=0xc0420445e0 ptr(string(v)=0xc042021c38
//Benchmark_Normal-4    1000000000           0.87 ns/op
//Benchmark_Direct-4    2000000000           0.24 ns/op

The conclusions are as follows:

1. String constants are allocated to read-only segments during compilation, the corresponding data address is not writable, and the same string constants will not be stored repeatedly.

2. The string generated by fmt.sprintf is allocated on the heap, and the corresponding data address can be modified.

3. The constant empty string has a data address, and the dynamically generated string has no data address

4. Golang string and [] byte conversion will copy the data to the heap, and the returned data points to the copied data

5. Dynamically generated strings, even if the content is the same, the data is in different spaces

6. Only dynamically generated strings can be modified by black technology

7. String and [] byte are converted by replication, and the performance loss is nearly 4 times

Supplement: golang uses unsafe. Pointer to optimize byte [] and string conversion performance

We know that generally speaking, for a string

If you want to convert to byte [], it is implemented through type conversion syntax:


Res := string(bytes)

This method is recommended by go and has the advantage of safety. Although this operation will cause memory copy and performance loss, this loss can be ignored when processing general business.

However, if you want to optimize the performance in the case of frequent copies, you need to introduce unsafe.pointer:

func main()  {
 Var s = [] byte ("I always like Fujiwara Qianhua. JPG")
 Res := *(*string)(unsafe.Pointer(&s))
 fmt.Println(Res)
}

There is no memory copy in the process of forging a string through unsafe.pointer, so the efficiency will be faster than the type conversion of memory copy, but the cost is to expose the underlying data. This method is unsafe.

As for why slice can convert to string in this way

We can take a look at their underlying structures, sliceheader and stringheader:


type SliceHeader struct {
 Data uintptr
 Len  int
 Cap  int
  } 
type StringHeader struct {
 Data uintptr
 Len  int
  }

There is only one field cap (capacity) difference between the two types. The remaining fields in front are memory aligned, so they can be converted directly

The above is my personal experience. I hope I can give you a reference, and I hope you can support developpaer. If you have any mistakes or don’t consider completely, please don’t hesitate to comment.