The difference between [] run and [] byte of go

Time:2021-1-11

Reprinted from the original The difference between [] run and [] byte of go

When I see the go string, I see it by chance[]rune(s)It can convert a string to a Unicode code point. So it and[]byte(s)What’s the difference? Let’s test it

first := "fisrt"
fmt.Println([]rune(first))
fmt.Println([]byte(first))

[102 105 115 114 116] / / output result [] run
[102 105 115 114 116] / / output result [] byte

From the output point of view, there is no difference, the author can not come up with two identical things for no reason, so what is the difference? Just look at the source code

// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
type rune = int32

it turns out to be the case thatbyteRepresents a byte,runeRepresents four bytes, then you can come to the conclusion, take a look at a piece of code, using the Chinese string

First: "community"
fmt.Println([]rune(first))
fmt.Println([]byte(first))

[31038 21306] / / output result [] run
[231 164 190 229 140 186] / / output result [] byte

Here you can also see clearly that each Chinese string here takes up three bytes, so the difference is clear at a glance.
Here we can just mention that go language cuts Chinese strings. The string interception and slicing of go are the same, s [n: M]Left closed right openLet’s take an example

s := "golangcaff"
fmt.Println(s[:3])

Gol / / output, it seems no problem, successfully intercepted three characters

How about Chinese? Let’s take a look at an example

S: "intercept Chinese"
//Try to intercept it like this?
fmt.Println(s[:2])

? / / the output is expected, and the common??

So how to intercept? Here, we need to convert Chinese into Unicode code points by using [] run, and then use string to convert them back. Let’s take a look at an example.

S: "intercept Chinese"
//Try to intercept it like this?
res := []rune(s)
fmt.Println(string(res[:2]))

Intercepting / / output, intercepting successfully

Of course, you can use [] byte to intercept, but then you need to know how many bytes your Chinese characters occupy. It seems that this method is not advisable, because you can’t know.
Why?s[:n]It can’t be intercepted directly. Through the experiment, I guess that if it is intercepted directly, the bottom layer will convert Chinese into Chinese[]byteInstead of[]rune。 You can try:

S: "intercept Chinese"
//Try to intercept it like this?
fmt.Println(s[:3])

Intercept / / output

Of course, this is just my guess, not the specific implementation of the source code.

This work adoptsCC agreementReprint must indicate the author and the link of this article