Practical tutorial on modifying character set in MySQL



In mysql, the system supports many character sets, and there are slight differences between different character sets. At present, the most commonly used character sets should be utf8 and utf8mb4. Compared with utf8, utf8mb4 supports storing Emoji expressions and has a wider range of applications. This article will introduce how to change utf8 into utf8mb4 character set.

1. Introduction to utf8 and utf8mb4 character sets

Character is the general name of various characters and symbols, including national characters, punctuation marks, graphic symbols, numbers, etc. A character set is a collection of multiple characters.

The character set in MySQL can act on four levels: server level, database level, table level and column level. Server level comparison rules are defined by character_ set_ The server parameter controls that if no explicit character set is specified when creating a database, table or column, the character set of the previous level will be inherited.

The default character set of MySQL version 5.7 and earlier is Latin1, and the default character set of MySQL version 8.0 is utf8mb4. However, using Latin1 is easy to cause garbled code, so utf8 and utf8mb4 are the most widely used. Utf8 is actually an alias of utf8mb3, which only uses 1 ~ 3 bytes to represent characters. Utf8mb4 uses 1-4 bytes to represent characters, which can store more Emoji expressions and any new Unicode characters. Utf8mb4 is compatible with utf8 and can represent more characters than utf8. It is a superset of utf8 character set. So now some new businesses suggest setting the character set of the database to utf8mb4, especially when there is an expression storage requirement.

2. Modify character set method

At present, the demand for Emoji expression storage in Internet business is increasing. For example, nicknames and comments should support Emoji. At this time, if the database field uses utf8 character set, the following error will be reported:

  java.sql.SQLException: Incorrect string value: ‘\xF0\x9F\x92\x95\xF0\x9F…’ for    column…………

For business needs, we need to change the database character set to utf8mb4. Fortunately, utf8mb4 is a superset of utf8. There is no need to make other conversion except changing the code to utf8mb4. Here is a brief description of the modification method.

System parameter modification

First, you should modify the system character set parameters, so that the default character set of the library table created later is utf8mb4. Locate the configuration file and add or modify the following parameters:

vi /etc/my.cnf

character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
#Ignore the character code set by the application connection to keep consistent with the global setting

Modify database character set

For the database that has been created, if it was originally a utf8 character set, you can modify it as follows:

#Set database character set encoding
ALTER DATABASE `dbname` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

mysql> show create database testdb;
| Database | Create Database             |
| testdb | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8 */ |
1 row in set (0.00 sec)

mysql> alter database `testdb` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 1 row affected (0.01 sec)

mysql> show create database testdb;
| Database | Create Database             |
| testdb | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
1 row in set (0.00 sec)

Modify table and field character set

Similarly, for the created table, modifying the character set of the global and database will not affect the character set of the original table and field. The original utf8 table can be modified as follows:

#Modify table character set
alter table `tb_name` default character set utf8mb4;

#Modify a field character set
alter table `tb_name` modify col_name varchar(20) character set utf8mb4;

#Modify table and field character set at the same time
alter table `tb_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

#If there are many tables in a library, you can splice the batch modification statements to be executed
 TABLE_SCHEMA = 'testdb';

3. Some recommendations

It seems that the modification method is quite simple, but you should be extra careful about the production environment. In particular, when modifying the field character set, it will lock and prevent write operations. It is also very slow for large tables, which may affect online business.

If your database is relatively small, using the above method should not be a problem. For the online environment, if you want to modify the character set, you must make a good evaluation. It is best to shut down and modify it during the low peak period of business. You must back up before modifying. If there is no downtime, you can consider modifying in the standby database first, and then switching between active and standby, but this will be more troublesome.

If possible, you can also prepare another empty instance. First import the table structure, change it to utf8mb4 character set, and then import the data. This is also a method, but shutdown switching may also be required.

The above is the details of the method of modifying the character set of MySQL. For more information about modifying the character set of MySQL, please pay attention to other relevant articles of developeppaer!