MySQL learning notes (8): character set

Time:2021-3-16

This article is updated on June 16, 2019, using MySQL 5.7 and operating system deepin 15.4.

catalog

Common character set

character set Fixed length Code width explain
ASCII or iso-646 yes 1 byte 7 bits English letters, numbers, punctuation marks and 33 control characters
ISO-8859 series yes 1 byte 8 bits Each Western European character set, compatible with ASCII
GB2312 no 1 or 2 bytes Compatible with ASCII, not recommended
GBK no 1 or 2 bytes Compatible with GB2312
GB18030 no 1. 2 or 4 bytes Compatible with GBK
Unicode or ucs-4 or iso-10646 yes 4 bytes Starting from the highest byte, it represents group, plane, row and ceil, including BMP, auxiliary font and special font
UCS-2 yes 2 bytes Basic Multilingual literal (BMP) of ucs-4

Unicode has the following encoding methods:

Coding mode Fixed length Code width explain
UTF-8 no 1 to 6 bytes ASCII compatible
UTF-16 no 2 or 4 bytes BMP is represented by 2 bytes and 4 bytes. It is divided into utf-16be and utf-16le according to byte order
UTF-32 yes 4 bytes Unicode original encoding, divided into utf-32be and utf-32le in different byte order

UTF-8 is encoded as follows:

byte Code number Code space Byte combination
1 0-7 0x0 – 0x7f 0xxxxxxx
2 8-11 0x80 – 0x7ff 110xxxxx,10xxxxxx
3 12-16 0x800 – 0xffff 1110xxxx,10xxxxxx,10xxxxxx
4 17-21 0x10000 – 0x1fffff 11110xxx,10xxxxxx,10xxxxxx,10xxxxxx
5 22-26 0x200000 – 0x3ffffff 111110xx,10xxxxxx,10xxxxxx,10xxxxxx,10xxxxxx
6 27-31 0x4000000 – 0x7fffffff 1111110x,10xxxxxx,10xxxxxx,10xxxxxx,10xxxxxx,10xxxxxx

For each Chinese character, GBK uses 2 bytes and UTF-8 uses 3 bytes.

MySQL character set

MySQL’s UTF-8 character set has a maximum length of 3 bytes and can only represent characters in BMP (0-0xffff). To use characters larger than 3 bytes, such as Emoji, use the utf8mb4 character set.

MySQL contains two concepts: character set and proofreading rules. Proofing rules are used to define the way to compare strings. A character set has at least one proofing rule, but there can be multiple proofing rules. The naming of proofing rules starts with the name of the relevant character set, usually with a language name in the middle, and starts with_ CI (case insensitive)_ CS (case sensitive) or_ Bin (the comparison is based on the value of the character encoding and is language independent), such as GBK_ chinese_ ci。

MySQL has four levels of character set and proofreading rules

  • Server level: it can be set through configuration file or startup option, and the default is Latin1.
  • Database level: available throughCREATE DATABASEorALTER DATABASESetting: if there are records in the database, modifying the character set will not store the existing data according to the new character set.
  • Table level: available throughCREATE TABLEorALTER TABLESetting: if a record already exists in the table, modifying the character set will not store the existing data according to the new character set.
  • Field level: theCREATE TABLEorALTER TABLESetting: if a record already exists in the table, modifying the character set will not store the existing data according to the new character set.

When the client interacts with the server, there are character sets and proofreading rules for the client, the connection, and the returned results. Usually, the three are the same to ensure that the data can be read correctly. Can passSET NAMES charsetModify the three values at the same time. Can pass[_charset] 'str' [COLLATE collation]Force character sets and proofing rules for strings.

To adjust the character set of an existing record, you need to export the data first, and then import it again after appropriate adjustment.

Recommended Today

Analysis of super comprehensive MySQL statement locking (Part 1)

A series of articles: Analysis of super comprehensive MySQL statement locking (Part 1) Analysis of super comprehensive MySQL statement locking (Part 2) Analysis of super comprehensive MySQL statement locking (Part 2) Preparation in advance Build a system to store heroes of the Three KingdomsheroTable: CREATE TABLE hero ( number INT, name VARCHAR(100), country varchar(100), PRIMARY […]