Character type of MySQL
Character type
MySQL
There are several types of character storage available, but in most cases we only usechar
andvarchar
That’s fine.
type | size | purpose |
---|---|---|
CHAR | 0 – 255 bytes | Fixed length string |
VARCHAR | 0 – 65535 bytes | Variable length string |
TINYBLOB | 0 – 255 bytes | A binary string of 255 characters or less |
TINYTEXT | 0 – 255 bytes | Short text string |
BLOB | 0 – 65 535 bytes | Long text data in binary form |
TEXT | 0 – 65 535 bytes | Long text data |
MEDIUMBLOB | 0 – 16 777 215 bytes | Medium length text data in binary form |
MEDIUMTEXT | 0 – 16 777 215 bytes | Medium length text data |
LONGBLOB | 0 – 4 294 967 295 bytes | Maximum text data in binary form |
LONGTEXT | 0 – 4 294 967 295 bytes | Maximum text data |
character set
String is divided into binary and non binary types. Binary is used to store images, sounds and other files, and non binary is used to store text data.
Non binary text is affected by character sets and proofreading rules.
In fact, the character set is the character encoding.
concept
Character set is a set of multiple characters. There are many kinds of character sets. Each character set contains different numbers of characters. The common character sets areGBK
、BIG5
、UTF8
。
UTF8
The character set contains a wider range of text content, such as Korean, Japanese, German, higher compatibility, is also the recommended character set(UTF8MB4
)。
Table does not set character set inheritance database, field does not set character set inheritance table, so generally we can specify character set for database in configuration file.
MySQL > show character set; -- view the character set supported by the server
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| binary | Binary pseudo charset | binary | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.04 sec)
Proofreading rules
A set of rules for character comparison and sorting within a character set to_ci
The end is case insensitive_bin
The ending is case insensitive.
When using case insensitive proofing rulesA
Anda
Is the same, otherwise it is not the same, which will affect sorting and alignment.
Modifying table proofreading rules does not affect the original fields of the table, but only affects the newly added fields.
Show collation; -- View proofing rules supported by the server
Basic definition
For character types, the specified width is the deposit width, in characters.
mysql> create table `test` (
-> ch char(5) not null,
-> vc varchar(5) not null
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Fixed length to length
Distinguishing differences
char
Is a fixed length string. When the number of characters stored is less than the specified width, it will be filled with.
varchar
Is a variable length string. When the number of characters stored is less than the specified width, no characters are used to fill in.
Generally speaking,
char
Is faster thanvarchar
Higher (about 50%), butMore wasted disk space。stay
InnoDB
In the storage engine, there is no such difference.
insert values | Char (4) stored value | Char (4) occupies space | Varchar (4) deposit value | Varchar (4) occupies space |
---|---|---|---|---|
'' |
' ' |
4 bytes | '' |
1 byte |
'ab' |
'ab ' |
4 bytes | 'ab' |
3 bytes |
'abcd' |
'abcd' |
4 bytes | 'abcd' |
5 bytes |
'abcdefgh' |
'abcd' |
4 bytes | 'abcd' |
5 bytes |
becausevarchar
The type is variable length storage, so a header is needed to identify how many characters the real content occupies. This header usually occupies 1 byte.Depositabcd
The header actually takes up 5 bytes. And that’s why, in generalvarchar
Save memory, not say the whole situation.
Comparison of differences
If you want to make a comparison test of the difference between fixed length and variable length, you need to use the following two functions.
Length(): view the number of bytes
char_ Length(): view the number of characters
It is suggested to use it in the same data tablevarchar
orchar
Type, which is more recommended herechar
Type.
Common functions
toggle case
upper()
Lower case to upper case,lower()
Upper case to lower case.
The following shows the case conversion.
mysql> select upper("justthesame"),lower("JUSTTHESAME");
+----------------------+----------------------+
| upper("justthesame") | lower("JUSTTHESAME") |
+----------------------+----------------------+
| JUSTTHESAME | justthesame |
+----------------------+----------------------+
1 row in set (0.00 sec)
Specified measurement
left()
Andright()
Function is used to take the specified number of characters left or right.
Here’s a demonstration of taking three characters from the left.
mysql> select left("qwert",3);
+-----------------+
| left("qwert",3) |
+-----------------+
| qwe |
+-----------------+
1 row in set (0.00 sec)
Middle character
mid()
Take the string from the middle, parameter 2 is the specified starting position (starting from 1), and parameter 3 is the number of characters to be fetched (it can not be specified, and has been taken all the time).
Here’s how to take two from the second character.
mysql> select mid("qwert",2,2);
+------------------+
| mid("qwert",2,2) |
+------------------+
| we |
+------------------+
1 row in set (0.00 sec)
Intercept character
substring()
Take all strings to the right starting from the specified starting position, and specify the ending position. (actually, I feel that this andmid()
It doesn’t seem to make any difference.
The following shows how to intercept the first character to the fourth character.
mysql> select substring("qwertyui",1,4);
+---------------------------+
| substring("qwertyui",1,4) |
+---------------------------+
| qwer |
+---------------------------+
1 row in set (0.00 sec)
Character length
char_length()
Can get the character length, if you want to getchar
The character length of type, including padding space, needs to be modifiedsql
pattern.
modifysql
pattern:SET sql_mode = 'PAD_CHAR_TO_FULL_LENGTH';
mysql> select char_length("qwertyu");
+------------------------+
| char_length("qwertyu") |
+------------------------+
| 7 |
+------------------------+
1 row in set (0.00 sec)
Byte length
length()
Can get byte length.
mysql> select length("qwertyu");
+-------------------+
| length("qwertyu") |
+-------------------+
| 7 |
+-------------------+
1 row in set (0.00 sec)
Character connection
concat()
Multiple characters can be connected.
mysql> select concat("123","abc","456","def");
+---------------------------------+
| concat("123","abc","456","def") |
+---------------------------------+
| 123abc456def |
+---------------------------------+
1 row in set (0.00 sec)