Character type of MySQL

Time:2020-11-29

Character type of MySQL

Character type

MySQLThere are several types of character storage available, but in most cases we only usecharandvarcharThat’s fine.

type size purpose
CHAR 0 – 255 bytes Fixed length string
VARCHAR 0 – 65535 bytes Variable length string
TINYBLOB 0 – 255 bytes A binary string of 255 characters or less
TINYTEXT 0 – 255 bytes Short text string
BLOB 0 – 65 535 bytes Long text data in binary form
TEXT 0 – 65 535 bytes Long text data
MEDIUMBLOB 0 – 16 777 215 bytes Medium length text data in binary form
MEDIUMTEXT 0 – 16 777 215 bytes Medium length text data
LONGBLOB 0 – 4 294 967 295 bytes Maximum text data in binary form
LONGTEXT 0 – 4 294 967 295 bytes Maximum text data

character set

String is divided into binary and non binary types. Binary is used to store images, sounds and other files, and non binary is used to store text data.

Non binary text is affected by character sets and proofreading rules.

In fact, the character set is the character encoding.

concept

Character set is a set of multiple characters. There are many kinds of character sets. Each character set contains different numbers of characters. The common character sets areGBKBIG5UTF8

UTF8The character set contains a wider range of text content, such as Korean, Japanese, German, higher compatibility, is also the recommended character set(UTF8MB4)。

Table does not set character set inheritance database, field does not set character set inheritance table, so generally we can specify character set for database in configuration file.

MySQL > show character set; -- view the character set supported by the server
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.04 sec)

Proofreading rules

A set of rules for character comparison and sorting within a character set to_ciThe end is case insensitive_binThe ending is case insensitive.

When using case insensitive proofing rulesAAndaIs the same, otherwise it is not the same, which will affect sorting and alignment.

Modifying table proofreading rules does not affect the original fields of the table, but only affects the newly added fields.

Show collation; -- View proofing rules supported by the server

Basic definition

For character types, the specified width is the deposit width, in characters.

mysql> create table `test` (
    -> ch char(5) not null,
    -> vc varchar(5) not null
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Fixed length to length

Distinguishing differences

charIs a fixed length string. When the number of characters stored is less than the specified width, it will be filled with.

varcharIs a variable length string. When the number of characters stored is less than the specified width, no characters are used to fill in.

Generally speakingcharIs faster thanvarcharHigher (about 50%), butMore wasted disk space

stayInnoDBIn the storage engine, there is no such difference.

insert values Char (4) stored value Char (4) occupies space Varchar (4) deposit value Varchar (4) occupies space
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes

becausevarcharThe type is variable length storage, so a header is needed to identify how many characters the real content occupies. This header usually occupies 1 byte.DepositabcdThe header actually takes up 5 bytes. And that’s why, in generalvarcharSave memory, not say the whole situation.

Comparison of differences

If you want to make a comparison test of the difference between fixed length and variable length, you need to use the following two functions.

Length(): view the number of bytes

char_ Length(): view the number of characters

It is suggested to use it in the same data tablevarcharorcharType, which is more recommended herecharType.

Common functions

toggle case

upper()Lower case to upper case,lower()Upper case to lower case.

The following shows the case conversion.

mysql> select upper("justthesame"),lower("JUSTTHESAME");
+----------------------+----------------------+
| upper("justthesame") | lower("JUSTTHESAME") |
+----------------------+----------------------+
| JUSTTHESAME          | justthesame          |
+----------------------+----------------------+
1 row in set (0.00 sec)

Specified measurement

left()Andright()Function is used to take the specified number of characters left or right.

Here’s a demonstration of taking three characters from the left.

mysql> select left("qwert",3);
+-----------------+
| left("qwert",3) |
+-----------------+
| qwe             |
+-----------------+
1 row in set (0.00 sec)

Middle character

mid()Take the string from the middle, parameter 2 is the specified starting position (starting from 1), and parameter 3 is the number of characters to be fetched (it can not be specified, and has been taken all the time).

Here’s how to take two from the second character.

mysql> select mid("qwert",2,2);
+------------------+
| mid("qwert",2,2) |
+------------------+
| we               |
+------------------+
1 row in set (0.00 sec)

Intercept character

substring()Take all strings to the right starting from the specified starting position, and specify the ending position. (actually, I feel that this andmid()It doesn’t seem to make any difference.

The following shows how to intercept the first character to the fourth character.

mysql> select substring("qwertyui",1,4);
+---------------------------+
| substring("qwertyui",1,4) |
+---------------------------+
| qwer                      |
+---------------------------+
1 row in set (0.00 sec)

Character length

char_length()Can get the character length, if you want to getcharThe character length of type, including padding space, needs to be modifiedsqlpattern.

modifysqlpattern:SET sql_mode = 'PAD_CHAR_TO_FULL_LENGTH';

mysql> select char_length("qwertyu");
+------------------------+
| char_length("qwertyu") |
+------------------------+
|                      7 |
+------------------------+
1 row in set (0.00 sec)

Byte length

length()Can get byte length.

mysql> select length("qwertyu");
+-------------------+
| length("qwertyu") |
+-------------------+
|                 7 |
+-------------------+
1 row in set (0.00 sec)

Character connection

concat()Multiple characters can be connected.

mysql> select concat("123","abc","456","def");
+---------------------------------+
| concat("123","abc","456","def") |
+---------------------------------+
| 123abc456def                    |
+---------------------------------+
1 row in set (0.00 sec)

Recommended Today

Summary of recent use of gin

Recently, a new project is developed by using gin. Some problems are encountered in the process. To sum up, as a note, I hope it can help you. Cross domain problems Middleware: func Cors() gin.HandlerFunc { return func(c *gin.Context) { //Here you can use * or the domain name you specify c.Header(“Access-Control-Allow-Origin”, “*”) //Allow header […]