What’s wrong with the go language’s native JSON package? How to better handle JSON data?

Time:2021-9-1

Go “players” may be confused when they see this topic – for JSON, go native libraryencoding/jsonIt has provided a sufficiently comfortable JSON processing tool, which is widely praised by go developers. What else is wrong with it? However, in fact, in the process of business development, we have encountered many native problemsjsonThe problem that we can’t do well or even can’t do well can’t fully meet our requirements.

So, what’s wrong with it? When are third-party libraries used? How to select? How is the performance?

However, before throwing out specific problems, let’s briefly understand some libraries commonly used by go in dealing with JSON and the test data analysis of these libraries. If readers feel that the following text is too long, they can also jump directly to the conclusion.

Some commonly used go JSON parsing Libraries

Go native encoding / JSON

This should be the most familiar library for go programmers. Use itjson.Unmarshalandjson.MarshalFunction, you can easily deserialize the binary data in JSON format into the specified go structure, and serialize the go structure into a binary stream. For data with unknown or uncertain structure, binary deserialization tomap[string]interface{}In type, kV mode is used for data access.

Here are two additional features that you may not notice:

  • The JSON package parses a JSON data, which can be either an object or an array, or a string, a number, a boolean, or a null value. The above two functions actually support the resolution of these types of values. For example, the following code can also be used
var s string
err := json.Unmarshal([]byte(`"Hello, world!"`), &s)
//Note that the double quotation marks in the string cannot be missing. If it is just 'Hello, world', it is not a legal JSON sequence and an error will be returned.
  • When parsing JSON, if you encounter case problems, you will convert the case as much as possible. Even if a key is different from the definition in the structure, if it is the same after ignoring case, it can still assign a value to the field. For example, the following example can illustrate:
cert := struct {
    Username string `json:"username"`
    Password string `json:"password"`
}{}
    
err := json.Unmarshal([]byte(`{"UserName":"root","passWord":"123456"}`), &cert)
if err != nil {
    fmt.Println("err =", err)
} else {
    fmt.Println("username =", cert.Username)
    fmt.Println("password =", cert.Password)
}
//Actual output: 
// username = root
// password = 123456

jsoniter

Open jsoniter’sGitHub home page, it started with two key words:high-performanceas well ascompatible。 These are also the two biggest selling points of this bag.

First, compatibility:jsoniterThe biggest advantage is that it is 100% compatible with the standard library, so the code can be easily migrated. It’s really inconvenient. You can also use goMonkeyForcibly replace the relevant function entry of JSON.

Then look at performance: like other open source libraries that boast about their performance, their own test conclusions cannot be accepted without brains. Here, I’d like to make a few simple conclusions based on my personal test:

  • In the single scenario of deserialization structure, jsoniter does improve compared with the standard library, and the result measured by myself is about 1.4 times higher
  • But in the same single scenario of deserializing structures, jsoniter is far inferioreasyjson
  • Other scenes are not necessarily, I will explain later

In terms of performance,jsoniterThe main reason for faster performance than the official libraries jointly developed by many gods is to minimize unnecessary memory replication and reduce the use of reflect – for objects of the same type, jsoniter only calls reflect once and caches them after parsing. However, with the iteration of the go version, the performance of the native JSON library is getting higher and higher, and the performance advantage of jsonter is getting narrower and narrower.

In addition, jsoniter supportsGetFunction, support directly from a[]byteRead the response field from binary data, which will be described later

easyjson

This is another JSON parsing package above GitHub. Compared with the 9K star of jsoniter, easyjson seems to be a little less, with 3k, but it is actually a very popular open source project.

The main selling point of this bag is still fast. Why is easyjson faster than jsoniter? Because the development mode of easyjson is similar to protobuf, its code tools need to be used to generate serialized / deserialized program code for each structure before the program runs. Each program has a customized analytic function.

But also because of this development mode, easyjson is more intrusive to the business. On the one hand, ingo buildYou need to write a code before; On the other hand, the related JSON processing functions are not compatible with the native JSON library.

jsonparser

This is a JSON parsing library that I personally like very much. The 3.9k star number also shows that it is not popular. itsGitHub home pageThe title claims to have up to 10x performance than the official library.

Still the same sentence: open source projects cannot accept their own test conclusions without brains. I’ve personally measured the performance of this 10x, but it can’t represent all scenarios.

Why does jsonparser have such high performance? For jsonparser itself, it is only responsible for deconstructing some key boundary characters in a binary byte string, such as:

  • find", then find the end", there is a string in the middle
  • find[, then find the pair], in the middle is an array
  • find{, then find the pair}, this is an object in the middle
  • ……

Then it will find the middle of the data[]byteThe data is handed over to the caller for further processing. At this time, the caller is responsible for the parsing and legitimacy check of these binary data.

Why do I like open source libraries that look so troublesome? Because developers can build special logic based on JSON parser, or even build their own JSON parsing library. My own open source projectjsonvalueIt was also implemented based on jsonparser in the early stage. Although jsonparser was later abandoned in order to further optimize performance, this does not affect my admiration for it.

jsonvalue

This projectIt is my own JSON parsing library. It was originally designed to replace the native JSON librarymap[string]interface{}To handle the needs of unstructured JSON data. For this reason, I have another article describing this problem: [still using map [string] interface {} to deal with JSON? Tell you a more efficient method – jsonvalue] [2].

At present, I have roughly completed the optimization of the library (see the master branch), and the performance is much higher than that of the native JSON library, and slightly better than that of jsoniter. Of course, this is also a specific case. The performance of various libraries is different for different scenarios. This is also one of the purposes of writing this article.

JSON processing under normal operation

What else besides struct and map? The following is a list of the scenarios I encountered in actual business development for readers. All test codes areOpen Source, readers can refer to it or give me opinions, such as issue, comment and private chat.

General operations: structure resolution

Structure parsing, which is the most common operation to handle JSON in go. Here I define such astructural morphology

type object struct {
    Int    int       `json:"int"`
    Float  float64   `json:"float"`
    String string    `json:"string"`
    Object *object   `json:"object,omitempty"`
    Array  []*object `json:"array,omitempty"`
}

A little bad – this structure can be madly self nested.

Then, I defined a binary stream, usingjson.cnAs you can see, this is a JSON object with a 5-tier structure.

{"int":123456,"float":123.456789,"string":"Hello, world!","object":{"int":123456,"float":123.456789,"string":"Hello, world!","object":{"int":123456,"float":123.456789,"string":"Hello, world!","object":{"int":123456,"float":123.456789,"string":"Hello, world!","object":{"int":123456,"float":123.456789,"string":"Hello, world!"},"array":[{"int":123456,"float":123.456789,"string":"Hello, world!"},{"int":123456,"float":123.456789,"string":"Hello, world!"}]}}},"array":[{"int":123456,"float":123.456789,"string":"Hello, world!"},{"int":123456,"float":123.456789,"string":"Hello, world!"}]}

Using these two structures, the officialencoding/jsonjsonitereasyjsonThree packages were tested by Marshall and unmarshal. First, let’s look at the test results of deserialization:

Package name function Time per iteration Memory usage Alloc number performance evaluation
encoding/json Unmarshal 8775 ns/op 1144 B/op 25 allocs/op ★★
jsoniter Unmarshal 6890 ns/op 1720 B/op 56 allocs/op ★★☆
easyjson UnmarshalJSON 4017 ns/op 784 B/op 19 allocs/op ★★★★★

Here are the serialized test results:

Package name function Time per iteration Memory usage Alloc number performance evaluation
encoding/json Marshal 6859 ns/op 1882 B/op 6 allocs/op ★★
jsoniter Marshal 6843 ns/op 1882 B/op 6 allocs/op ★★
easyjson MarshalJSON 2463 ns/op 1240 B/op 5 allocs/op ★★★★★

Purely in terms of performance,easyjsonDeservedly, the serialization and deserialization functions are customized for each struct. It achieves the highest performance and is 2.5 ~ 3 times more efficient than the other two libraries. Jsoniter is slightly higher than the official JSON, but the difference is not big.

Regular and unconventional operations: map [string] interface {}

The reason why it is “unconventional” is that in this case, the program needs to deal with unstructured JSON data, or deal with many different types of data structures in a section of function, so it can’t use the structure pattern. The solution for the official JSON library is to use (for object types)map[string]interface{}To save. In this scenario, only official JSON and jsoniter support.

The test data is as follows. First, deserialization:

Package name function Time per iteration Memory usage Alloc number performance evaluation
encoding/json Unmarshal 13040 ns/op 4512 B/op 128 allocs/op ★★
jsoniter Unmarshal 9442 ns/op 4521 B/op 136 allocs/op ★★

The serialization test data are as follows:

Package name function Time per iteration Memory usage Alloc number performance evaluation
encoding/json Marshal 17140 ns/op 5865 B/op 121 allocs/op ★★
jsoniter Marshal 17132 ns/op 5865 B/op 121 allocs/op ★★

It can be seen that in this case, everyone is half weight, and jsoniter has no obvious advantages. Even the large amount of data parsing of jsoniter as a selling point has little advantage.

Under the same amount of data, the deserialization time of the two libraries is basically twice that of the structure, and the serialization time is about 2.5 times that of the structure.

Emmm… Old fellow iron can not use this kind of operation, let alone the procedure is being processed.interface{}I still need all kinds of assertions. You can see my painarticleFeel it.

Unconventional operations – deserialization

When it comes to the inability to use struct, the eight immortals of various open source projects cross the sea to show their magic power. In fact, each library has very detailed and powerful additional functions, which can’t be finished in this article alone. Here I will list several libraries and their representative ideas, and test data of various situations will be attached later.

jsoniter

In dealing with unstructured JSON, if you want to parse a paragraph[]byteData and obtain a value. Jsoniter has the following similar scheme.

The first scheme is to directly parse the original text and return the required data:

//Read the name field of the first element in the response.userlist array in the binary data
username := jsoniter.Get(data, "response", "userList", 0, "name")
fmt.Println("username:", username.ToString())

You can also directly return an object, and you can continue the operation based on the object:

obj := jsoniter.Get(data)
if obj.ValueType() == jsoniter.InvalidType {
    // err handling
}
username := obj.Get("response", "userList", 0, "name")
fmt.Println("username:", username.ToString())

This function has a very big feature, that is, on-demand parsing. For example, in this statementobj := jsoniter.Get(data)In, jsoniter only does the minimum data check, at least first parses the JSON that is currently an object type, and does not parse other parts.

And even when it comes to the second callobj.Get("response", "userList", 0, "name")Jsoniter also tries its best to reduce unnecessary parsing and only parses the parts that need to be parsed.

For example, request parameters require parsingresponse.userListIf jsoniter encounters a value such asresponse.gameListWhen waiting for irrelevant fields, jsoniter will bypass them as much as possible without processing, so as to reduce irrelevant CPU time as much as possible.

However, it should be noted that the returnedobjFrom the perspective of interface function, it can be understood that it is read-only and cannot be re serialized into binary sequence.

jsonparser

As opposed to jsoniter, you need to parse a paragraph[]byteThe support of jsonparser is limited.

For example, if we can know the type of a value, such as the username field above, we can get it as follows:

username, err := jsonparser.GetString(data, "response", "userList", "[0]", "name")
if err != nil {
    // err handling
}
fmt.Println("username:", username)

However, the get series functions of jsonparser can only obtain basic types other than null, that is, number, Boolean and string.

If you want to operate object and array, you should be familiar with the following two functions, which I personally think are the core of jsonparser:

func ArrayEach(data []byte, cb func(value []byte, dataType ValueType, offset int, err error), keys ...string) (offset int, err error)

func ObjectEach(data []byte, callback func(key []byte, value []byte, dataType ValueType, offset int) error, keys ...string) (err error)

The two functions parse the binary data in order, and return the extracted data segments to the caller through the callback function, which operates on the data. Callers can group maps, slice, and even perform operations that cannot be performed normally (described later)

jsonvalue

This is the open source go JSON operation library developed by myself. The API design style of get class operation is similar to the second style of jsoniter.

For example, we also want to get the username field mentioned above, so we can get it as follows:

v, err := jsonvalue.Unmarshal(data)
if err != nil {
    // err handling
}
username := v.GetString("response", "userList", 0, "name")
fmt.Println("username:", username)

Performance test comparison

In the “unconventional operation” scenario mentioned in this section, in the three libraries, jsoniter and jsonparser are “on-demand” during parsing, while jsonvalue is fully parsed. Therefore, there are differences in the formulation of test plans.

Here, I’ll throw out the test data first. There are two parts in the test evaluation:

  • Performance evaluation: indicates the performance score in this scenario. It does not consider whether it is easy to use, but only considers whether the CPU execution efficiency is high or not
  • Function evaluation: indicates whether the subsequent processing of the program is convenient after obtaining the data in this scenario. Whether deserialization performance is high or not is not considered
Package name Function description / main function call Time per iteration Memory usage Alloc number performance evaluation Functional evaluation
Shallow analysis
jsoniter any := jsoniter.Get(raw); keys := any.Keys() 9118 ns/op 3024 B/op 139 allocs/op ★★★
jsonvalue jsonvalue.Unmarshal() 7684 ns/op 9072 B/op 61 allocs/op ★★★★★
jsonparser jsonparser.ObjectEach(raw, objEach) 853 ns/op 0 B/op 0 allocs/op ★★★★★ ★★
Read one of the deeper levels of data
jsoniter any.Get("object", "object", "object", "array", 1) 9118 ns/op 3024 B/op 139 allocs/op ★★★★★
jsonvalue jsonvalue.Unmarshal(); v.Get("object", "object", "object", "array", 1) 7928 ns/op 9072 B/op 61 allocs/op ★★★★★
jsonparser jsonparser.Get(raw, "object", "object", "object", "array", "[1]") 917 ns/op 0 B/op 0 allocs/op ★★★★★ ★★☆
Only one of the deeper level values is read from a large amount of (100x) data
jsoniter jsoniter.Get(raw, "10", "object", "object", "object", "array", 1) 29967 ns/op 4913 B/op 469 allocs/op ★★★★★
jsonvalue jsonvalue.Unmarshal(); v.Get("10", "object", "object", "object", "array", 1) 799450 ns/op 917030 B/op 6011 allocs/op ★★★★★
jsonparser jsonparser.Get(raw, "10", "object", "object", "object", "array", "[1]") 8826 ns/op 0 B/op 0 allocs/op ★★★★★ ★★☆
Complete traversal
jsoniter jsoniter.Get(raw)And recursively parse each child 45237 ns/op 12659 B/op 671 allocs/op ★★
jsonvalue jsonvalue.Unmarshal() 7928 ns/op 9072 B/op 61 allocs/op ★★★ ★★★★★
jsonparser jsonparser.ObjectEach(raw, objEach)And recursively parse each child 3705 ns/op 0 B/op 0 allocs/op ★★★★★
encoding/json Unmarshal 13040 ns/op 4512 B/op 128 allocs/op
jsoniter Unmarshal 9442 ns/op 4521 B/op 136 allocs/op ★☆

It can be seen that the above test data divides the deserialization scenario into four types. Here, I will explain in detail the application scenarios of the four cases and the corresponding technical selection suggestions

Shallow analysis

In the test code, shallow parsing refers to parsing only the shallowest key list for a deeper structure. This scenario is more for reference. It can be seen that the performance of jsonparser is better than that of other open source libraries. It can parse the key list of the first layer at the fastest speed.

However, in terms of ease of use, both jsonparser and jsonparser require developers to further process the obtained data. Therefore, the ease of use of jsonparser and jsonparser is slightly lower in this scenario.

Get a specific data in the body

This scenario is like this: in the JSON data body, only a small part of the data is useful to the current business and needs to be obtained. Here I have two situations:

  • Useful data accounts for a high proportion of all data (corresponding to “reading data at a deeper level”):

    • In this scenario, jsonparser performs as well as ever in terms of performance
    • From the perspective of ease of use, jsonparser requires the caller to process the data again, so jsoniter and jsonvalue are better
  • The proportion of useful data in all data is low (corresponding to “reading only one deeper level value from a large amount of (100x) data”):

    • In this scenario, jsonprser is still out of date in terms of performance
    • Jsonparser is still weak in terms of ease of use
    • Combining ease of use and performance, the lower the proportion of useful data in this scenario, the higher the value of jsonparser
  • The business needs to completely analyze the data – this scenario is the most complete consideration of the comprehensive performance of each scheme

    • From the perspective of performance, jsonparser is still excellent, but in this scenario, ease of use is actually a problem – in complex traversal operations, you need to encapsulate logic to store data
    • The second performance is jsonvalue, which is where I am very confident

      • Jsonvalue completes all and complete parsing, which takes less time than the so-called high-speed jsoniter
      • Compared with jsonparser, although the processing time of jsonvalue is 2.5 times that of jsonparser, the latter only completes the semi processing of data, while the former takes out the finished product for the caller
    • As for jsoniter, don’t use it in this scenario — its data can’t be seen when it needs to be fully parsed
    • Finally, the official JSON library and the data of jsoniter parsing map are added for reference only – in this scenario, it is also recommended not to use it

Unconventional operations – serialization

This refers to serializing a piece of data without a structure. This scenario generally occurs in the following situations:

  • The format of the data to be serialized is uncertain and may be generated according to other parameters.
  • The data to be serialized is too much and trivial. If the structure is defined one by one and marshaled, the readability of the code is too poor.

The first solution to this scenario is the “regular and unconventional operations” mentioned earlier, that is, using map.

As for unconventional operations, we first exclude jsoniter and jsonparser, because they have no direct method to build custom JSON structures. Easyjson is then excluded because it cannot operate on map. The only thing left is jsonvalue.

For example, we return the user’s nickname, assuming that the return format is:{"code": 0, "message": "success", "data": {"nickname": "revitalizing China"}}。 The code for using map is as follows:

code := 0
Nickname: = "revitalizing China"

res := map[string]interface{}{
    "code": code,
    "message": "success",
    "data": map[string]string{
        "nickname": nickname
    },
}
b, _ := json.Marshal(&res)

The jsonvalue method is:

res := jsonvalue.NewObject()
res.SetInt(0).At("code")
res.SetString("success").At("message")
res.SetString(nickname).At("data", "nickname")
b := res.MustMarshal()

It should be said that in terms of ease of use, it is very convenient. We serialize the official JSON, jsoniter and jsonvalue respectively, and the measured data are as follows:

Package name function Time per iteration Memory usage Alloc number performance evaluation
encoding/json Marshal 16273 ns/op 5865 B/op 121 allocs/op ★☆
jsoniter Marshal 16616 ns/op 5865 B/op 121 allocs/op ★☆
jsonvalue Marshal 4521 ns/op 2224 B/op 5 allocs/op ★★★★★

The results are already very obvious. You can also understand this reason, because when processing a map, you need to use the reflect mechanism to process data types, which greatly reduces the performance of the program.

Conclusion and selection suggestions

Structure serialization and deserialization

In this scenario, I personally recommend the official JSON library. Readers may be surprised. Here are my views:

  • Although easyjson outperforms all other open source projects, it has one of the biggest drawbacks: it requires additional tools to generate this code, and the version control of the additional tools is a little more o & M cost. Of course, if the reader’s team can handle protobuf well, it can manage easyjson in the same way
  • Before go 1.8, the performance of the official JSON library was criticized by many parties. However, the performance of the official JSON library today (1.16.3) is not comparable. In addition, as the most widely used (none) JSON library, the official library has the least bugs and the best compatibility
  • Although the performance of jsoniter is still better than that of the official, it has not reached the level of heaven. If you want to have the ultimate performance, you should choose easyjson instead of jsoniter
  • Jsoniter has been inactive in recent years. The author mentioned one some time agoissueNo reply. Later, I looked at the issue list and found that there are still some issues left in 2018

Serialization and deserialization of unstructured data

In this scenario, we need to look at high data utilization and low data utilization. The so-called data utilization refers to the high data utilization rate if more than a quarter of the data in the body of JSON data needs to be concerned and processed by the business.

  1. High data utilization – in this case, I recommend using jsonvalue
  2. Low data utilization – there are two cases: does JSON data need to be re serialized

    • No need to re serialize: at this time, just choose JSON parser, and its performance is really dazzling
    • Need to re serialize: in this case, there are two options. If the performance requirements are relatively low, you can use jsonvalue; If the performance requirements are high and only one data (important) needs to be inserted into the binary sequence, jsoniter can be usedSetmethod. Readers can refer to godoc

In practice, there are very few cases where a large amount of JSON data needs to be re serialized at the same time. In this scenario, it is often used when the proxy server, gateway, overlay relay service, etc. need to inject additional information into the original data. In other words, jsoniter has limited application scenarios.

The following is the comparison of operation efficiency of different libraries under data coverage from 1% to 60% (ordinate unit: μ s/op)

What's wrong with the go language's native JSON package? How to better handle JSON data?

It can be seen that when the data utilization rate of jsoniter reaches 25%, it has no advantage over jsonvalue; Jsonparser is about 40%. As for jsonvalue, due to the one-time full parsing of the data, the time-consuming data access after parsing is very little, so the time-consuming under different data coverage is very stable.

Other evil operations

I have also encountered some strange processing scenarios about JSON in practical application. I also take this opportunity to list them and share my solutions.

Case insensitive JSON

As mentioned earlier, “if JSON encounters case problems during parsing, it will convert case as much as possible. Even if a key is different from the definition in the structure, if it is the same after ignoring case, it can still assign a value to the field. “

However, if you use map, jsoniter and jsonparser, this is a big problem. We have two services that operate the same field in the MySQL database at the same time, but the case of one letter in the structure defined by the two go services is inconsistent. This problem has existed for a long time, but it has not been exposed because of the above characteristics of the official JSON parsing structure. Until one day, when we wrote a script to wash the data and used the map method to read this field, the bug was exposed

So I later added the case supported feature to jsonvalue to solve this problem:

Raw: = ` {"user": {"nickname": "pony"} ` // note the N in the list
v, _ := jsonvalue.UnmarshalString(raw)

fmt.Println("nickname:", v.GetString("user", "nickname"))
fmt.Println("nickname:", v.Caseless().GetString("user", "nickname"))

//Output
// nickname:
// nickname: pony

Sequential JSON objects

In the interface of the partner module, when the other party pushes the data flow, it is given to our business module in the format of a JSON object. Later, according to the demand, the pushed data requirements are orderly. If the interface format is changed to array, the data structure of both interfaces needs to be greatly changed. In addition, when rolling upgrade, we are bound to encounter the coexistence of old and new modules, so the interface needs to be compatible with two sets of interface formats at the same time.

Finally, we adopted a very evil way – the data producer can push kV out in order, and we, as consumers, use itjsonparserofObjectEachFunction, you can obtain the sequence of kV bytes in order, so as to complete the sequential acquisition of data.

Cross language UTF-8 string docking

Go is a very young language. When it was born, the mainstream character coding on the Internet was Unicode, and the coding format was UTF-8. Other languages with older generations may adopt different coding formats for various reasons.

This leads to different encoding formats adopted by different teams and companies for Unicode wide characters during cross language JSON docking. If this is the case, the solution is to uniformly adopt ASCII coding. For official JSON, you can refer toThis question and answerEscape wide characters.

If jsonvalue is used, ASCII escape is used by default, for example:

v := jsonvalue.NewObject()
v. Setstring ("China"). At ("nation")
fmt.Println(v.MustMarshalString())

//Output
// {"nation":"\u4E2D\u56FD"}

reference material


This article adoptsKnowledge sharing Attribution – non-commercial use – sharing in the same way 4.0 international license agreementLicense.

Link to this article:https://segmentfault.com/a/1190000039957766

Original author:amc, published onCloud + community, toooneselfMy blog. Reprint is welcome, but please indicate the source.

Original title: what’s wrong with the go language’s native JSON package? How to better handle JSON data

Release date: May 6, 2021

Original link:https://cloud.tencent.com/developer/article/1820473

What's wrong with the go language's native JSON package? How to better handle JSON data?