json_ Encode uses JSON_ UNESCAPED_ UNICODE

Time:2020-8-7

Ask questions

PHP nativejson_encodeMethods when encoding Chinese, no parameters are addedJSON_UNESCAPED_UNICODEGet a list of classes\uXXXXThe string with parameters is the Chinese we usually see. What happened?

Confirmation phenomenon

//1.php
<?php
echo json_ Encode ('good ');

# php 1.php > 1.txt

# ls -l 1.txt
-rw-r--r-- 1 root root 8 Jun 12 15:21 1.txt

# cat 1.txt
"\u597d"
//2.php
<?php
echo json_ Encode ('good ', JSON_ UNESCAPED_ UNICODE);

php 2.php > 2.txt

# ls -l 2.txt
-rw-r--r-- 1 root root 5 Jun 12 15:23 2.txt

# cat 2.txt
"Good"

Let’s start with the conclusion

We usually usejsonThe format isutf-8Code, but it recognizesutf-16The escape of encoding. That is,

Show normalgoodyesutf-8code;

\u597dWhen a single character (there are six) is picked up and transmitted, it is also trueutf-8Code, but in the specific analysis, it is judged that there is an escape\u597dThese six characters together, asutf-16code.

json_encodeAdd parametersJSON_UNESCAPED_UNICODE, no characterutf-16Escape, direct useutf-8。 Since no escape is used, the entire string size is reduced from 8 bytes to 5 bytes. The reason is the escape character\u597dEach character takes up 1 byte, which is 6 bytes in totalgoodOfutf-8The encoding is only 3 bytes, less than 3 bytes.

verification

hexdumpYou can view text files in binary format. The following-cParameter, if the corresponding single byte has aasciiCode, will directlyasciiCode format display, otherwise display with corresponding octal number;-bParameter, which displays each byte completely in octets.

# hexdump -c 1.txt
0000000   "   \   u   5   9   7   d   "       
0000008

//Octal format
# hexdump -b 1.txt
0000000 042 134 165 065 071 067 144 042       
0000008


# hexdump -c 2.txt
0000000   " 345 245 275   "         
0000005

//Octal format
# hexdump -b 2.txt
0000000 042 345 245 275 042          
0000005

Use of the aboveod -w1 -b 2.txtThere are similaritieshexdumpAlmost the same effect

It is not difficult to see from the above output,goodThe corresponding byte octal number should be345245275

Convert to binary, respectively111001011010010110111101

Binary can also be converted into hexadecimal numbers, but there is a concern about the size of byte order. Which end starts275still345Is it the beginning?

The breakthrough isgoodIt’s 3-byte, 3-byteutf-8The beginning of the code must be1110(see Ruan Yifeng’s note on character encoding: ASCII, Unicode and UTF-8)345And the inner byte is also the big end. After determining the big end, it is associated with the often said network byte order is the big end. Looking at the JSON standard,utf-16utf-32There is a distinction between the big and the smallutf-8No, thenutf-8CodedjsonIt should be big end. (I can’t figure out how to judge multi byte characters if it’s a small end.)

Convert binary to hexadecimal, respectivelyE5A5BD

Find an online code conversion verification to verify whether there is a wrong conversion

character Coded decimal Encoding hexadecimal Unicode encoding decimal Unicode encoding hexadecimal
good 15050173 E5A5BD 22909 597D

E5a5bd is aligned.

Another online code conversion site gets the resultThis is actuallyutf-8CorrespondingUnicode code pointIt’s not what you wantutf-8Coding, seriously misleading people, many online coding websites are like this.

Some definitions

  1. jsonIt is a transmission protocol that specifies a text organization structure for interaction.jsonCommonly usedutf-8Code, but not necessarily.
  2. unicodeIs a standard that defines a unique correspondence for each characterUnicode code point
  3. utf-8utf-16And so onUnicode code pointThe coding method of. such asutf-8The number of bytes corresponding to the encoding is 1-4 bytes. Since the number of short bytes is less, it is given priority to the more frequently used characters(Unicode code point)。

reference resources

  • JSON standard http://www.json.org.cn/standa…
  • Character encoding notes: ASCII, Unicode and UTF-8 http://www.ruanyifeng.com/blo…
  • Unicode and JavaScript http://www.ruanyifeng.com/blo…
  • Escape and Unicode encoding in JSON serialization https://segmentfault.com/a/11…
  • Make Jason understand Chinese better (JSON)_ UNESCAPED_ UNICODE) https://www.laruence.com/2011…
  • Online conversion of Chinese to UTF-8 encoding hexadecimal and Unicode encoding hexadecimal http://www.mytju.com/classcod…
  • Online conversion of Chinese to unicode encoding hexadecimal https://utf8.supfree.net/
  • Comparison of ASCII code table http://ascii.911cha.com/
  • Linux od tool, check the binary format of the file https://wangchujiang.com/linu…
  • Linux command learning summary: hexdump https://www.cnblogs.com/kerry…

Recommended Today

Detailed explanation of STM32 basic timer

The most basic function of timer is to deal with things regularly. For example, regularly send USART data, regularly collect ad data, regularly detect IO port potential, and output waveform through IO port. It can realize very rich functions. Timer is a very powerful peripheral. It is used in different industries in different ways and […]