MySQL character set

1. Character set and verification rules

Character set is a set of symbols and codes, and collation is a set of rules for comparing characters in a character set. That is, the collation of the character set. MySQL can use a variety of character sets and inspection rules to organize characters.

MySQL server can support multiple character sets. In the same server, the same database, or even different fields of the same table, different character sets can be specified. Compared with other database management systems such as oracle, In the same database can only use the same character set, MySQL obviously has greater flexibility.

Each character set may have multiple collation rules, and there is a default collation rule, and each collation rule is only for a certain character set, and has nothing to do with other character sets.
In MySQL, the concept of character set and encoding scheme are regarded as synonyms, and a character set is a combination of a conversion table and an encoding scheme.

Unicode (Universal Code) is a character encoding used on computers. Unicode was created to solve the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to meet the requirements of cross-language and cross-platform text conversion and processing. There are different encoding schemes for Unicode, including Utf-8, Utf-16 and Utf-32. Utf stands for Unicode Transformation Format.

Two, check the MySQL character set method

1, check the character set supported by the mysql server

The following SQL commands can all check the character set supported by the MySQL database< br>show character set;
select * from information_schema.character_sets;
We use the first command line as an example. After executing the command, you can see the following figure The result shown:

blob

2, view Character set collation rules

Query all collation rules supported by MySQL: show collation;
Query MySQL collation rules supported by utf8: show collation like'utf8% ';
Of course, you can also use the select statement to query:
select * from information_schema.collations where collation_name like'utf8%';
We use the first command For example, after executing the command, you can see the result shown in the figure below:

blob

3. View the character set of the current database

Execute show variables like'character%'; to view the character set used by the current database, as shown in the figure below Show:

blob

Explain the variables in the figure above The meaning of the name:
character_set_client: the character set of the client request data
character_set_connection: the character set of the client/server connection
character_ set_database: The character set of the default database, no matter how the default database is changed, it is this character set; if there is no default database, then use the character set specified by character_set_server, this variable is recommended to be managed by the system itself, not artificially defined.
character_set_filesystem: convert the file name on the os into this character set, that is, convert character_set_client to character_set_filesystem, the default binary is not to do any conversion

character_set_results: result set, the character set returned to the client
character_set_server: The default character set of the database server
character_set_system: The system character set. This value is always utf8 and does not need to be set. This character set is used for the names of database objects (such as tables and columns) and also for the names of functions stored in catalog tables.

4. View the collation rules of the current database

Execute show variables like'collation%'; to view the collation rules used by the current database, as shown in the figure below Show:

blob

Explain the variables in the figure above The meaning of the name:
collation_connection: The character set of the current connection.
collation_database: The default collation for the current date. Every time a USE statement is used to “jump” to another database, the value of this variable will change. If there is no current database, the value of this variable is the value of the collation_server variable.
collation_server: The default collation of the server.

The naming rule of the sorting method is: character set name language suffix, and the meaning of each typical suffix is ​​as follows:
1) _ci: case-insensitive sorting method< br>2) _cs: case-sensitive sorting method
3) _bin: binary sorting method, the size comparison will be based on character encoding and does not involve human language, so the _bin sorting method does not include human language

< h2>Three, MySQL character set settings

1, Overview

MySQL character set settings are divided into two categories:
1) Create the default value of the object.
2) Control the configuration of interactive communication between server and client.

1.1 Default values ​​for creating objects

The character set collation rules have 4 levels of default settings:
1) server level;
2) database level;
3) Table level, column level;
4) Connection level.
The lower-level settings will be integrated with the higher-level settings.
There is a general rule: first choose a reasonable character set for the server or database, and then let a column choose its own character set according to different actual conditions.

1.2 Control the configuration of interactive communication between server and client

Most MySQL clients do not have the ability to support multiple character sets at the same time, and can only use one character at a time set.
The character set conversion between client and server is controlled by the following MySQL system variables.
1) character_set_server: MySQL server default character set.
2) character_set_database: The default character set of the database.
3) character_set_client: MySQL server assumes the character set used by the query sent by the client.
4) character_set_connection: After the MySQL server receives the query request issued by the client, it converts it to the character set specified by the character_set_connection variable.
5) character_set_results: The MySQL server converts the result set and error information into the character set specified by character_set_results and sends it to the client.
6) character_set_system: system metadata (field name, etc.) character set
There are also variables starting with collation_ that are the same as above, which are used to describe character collation rules.
Notes:
• The default_character_set setting in my.ini only affects the connection character set when the mysql command connects to the server, and will not have any effect on applications that use the libmysqlclient library!
• The SQL function operations performed on fields are usually performed in the internal operating character set, and are not affected by the connection character set setting.
• The bare string in the SQL statement will be affected by the connection character set or the introducer setting. For operations such as comparisons may produce completely different results, you need to be careful!

1.3 Character set selection rules by default

(1) When compiling MySQL, a default character set is specified, this character set is latin1;
(2) Installation For MySQL, you can specify a default character set in the configuration file (my.ini). If not specified, this value is inherited from the one specified at compile time;
(3) When starting mysqld, you can set it in the command line parameters Specify a default character set. If not specified, this value is inherited from the configuration in the configuration file. At this time, character_set_server is set to the default character set;
(4) When creating a new database, unless Explicitly specify that the character set of this database is set to character_set_server by default;
(5) When a database is selected, character_set_database is set to the default character set of this database;
(6) In this When creating a table in the database, the default character set of the table is set to character_set_database, which is the default character set of the database;
(7) When setting a column in the table, unless explicitly specified, this column The default character set is the default character set of the table;

2, complete the setting of the character set and collation rules through SQL commands

2.1 Specify the character set and collation rules for the database< /h4>

When creating the database, you can specify the character set and collation rules for the database. An example of the SQL command is as follows:
create database dbtest charset=utf8 collate utf8_romanian_ci;
< code>charset=utf8 means to set the database character set to utf8
collate utf8_romanian_ci means to set the database collation rules to utf8_romanian_ci
We can view the SQL statement to create the database through show create database dbtest, the command execution effect is shown in the figure below:

blob

Note: The following statements can be used to create a database allocation character set:
charset=ut f8;
default charset=utf8;
charset utf8;
default charset utf8;
char set=utf8;
default char set=utf8;
char set utf8;
default char set utf8;
character set=utf8;
default character set=utf8;
character set utf8;
default character set utf8;

2.2 Specify a character set for the table And collation rules

We can specify the character set and collation rules for the table when creating a database table, and execute the SQL command as follows:


default charset=utf8 means setting the character set
collate utf8_romanian_ci means setting the collation rules
We can pass show create table table_charset View the SQL statement to create the table, the command execution effect is shown in the figure below:

blob

Note: There are several ways to specify the character set for the table:
default charset=utf8;
charset=utf8;
default character set=utf8;
character set=utf8;
default char set=utf8;
char set=utf8;

2.3 Specify character set and collation rules for table columns

h4>

We can specify the character set and collation rules for the columns when creating the database table, and execute the SQL command as follows:

share picture

< p>

The syntax is basically the same as setting the database table, and then we use the following SQL command to view the character set rules of this table

share picture


The command line execution result is shown in the figure below

blob

2.4 Modify and set the MySQL server-level character set

MySQL server supports many different character sets, such character sets Can be specified at compile time and run time.

1) Specify at compile time
You can specify the default character set and default collation at compile time. If you want to change the default character set and collation at the same time, you must use –with-charset and –with-collation at the same time Options.
The collation rules must be the legal collation rules of the character set, such as the following compilation example
./configure --with-charset=utf8 --with-collation=utf8_romanian_ci
by configure Option -with-extra-charsets=LIST, you can define the additional character set in the server.
LIST refers to any of the following:
a. A series of character set names separated by spaces
b.complex-to include all character sets that cannot be dynamically loaded
c.all-to change All character sets are included in binary
The compilation example is as follows
./configure --with-charset=utf8 --with-collation=utf8_romanian_ci --with-extra-charsets=all
Of course, the compilation specification is generally executed under the Linux operating system, and the installation of MySQL under windows generally does not make the compilation specification.

2) Specify in the parameter file my.ini

share picture

3) Specify before startup parameters

share picture

4) Pass -default- when logging in on the mysql client character-set specified

share picture

5) Temporary designation

a) Designation separately
mysql> SET character_set_client = utf8;
mysql> SET character_set_connection = utf8;
mysql > SET character_set_database = utf8;
mysql> SET character_set_results = utf8;
mysql> SET character_set_server = utf8;

b) mysql client use: set names utf8;
same as< br>set character_set_client=utf8;
set character_set_connection=utf8;
set character_set_results=utf8;

c) set character set utf8;
Same as
set character_set_client=utf8;
set character_set_results=utf8;
set [email protected]@collation_database;

3. Summary

I will introduce the next few MYSQL command:
View all character sets supported by the database
show character set;
show char set;
View the current status which includes of course Character set settings
status;
\s;
View system character set settings, including all character set settings
show variables like'char%';
View the character set settings of the table in the sqlstudy database
show table status from sqlstudy like'%countries%';
View The key to setting the character set of the table columns is that in the same table, each column can be set to a different character set.
show full columns from countries;

Know how to view The character set, let me talk about how to set these character sets
1. Modify the server level
a. Temporarily change, execute the following command line:
SET GLOBAL character_set_server=utf8;
b. Permanent changes:
Modify my.ini file

Share pictures

2. Modify the database level
a. Temporarily change, execute the command line:
SET GLOBAL character_set_database =utf8;
b. Permanent change:
It is enough to change the server level

3. Modify the table level and execute the command line:
ALTER TABLE table_name DEFAULT CHARSET utf8;
Permanent effect after change

4. Modify column level
Modification example:

Share picture

Permanent effect after the change

5. Change the connection character set
a. Temporary change:
set names utf8;
b. Permanent changes:
Modify my.ini file

Share pictures

3. Reasons for garbled characters in MySQL database and solutions

1. Root causes of garbled characters

1) The client does not set the client character set correctly, causing the original SQL statement to be converted to the character set pointed to by the connection. This conversion will lose information. If the client is in utf8 format, then if the conversion is In the gb2312 format, the information will definitely be lost, otherwise it will not be lost. It is necessary to ensure that the character set of the connection is greater than the character set of the client to ensure that the conversion does not lose information.
2) The database font is not set correctly. If the database font is not set correctly, then the connection character set is converted to the database character set and the encoding is still lost. The reason is the same as above.

2, garbled code or data loss

character_set_client: we want to tell the server, what code is the data I send you?
character_set_connection: Tell the character set converter what encoding to convert to?
character_set_results: What code is used for the result of the query?
If the above three are all in the character set N, they can be abbreviated as set names’N’;

2.1 Garbled problem

Insert utf8 into the data table with the default character set of utf8 The connection character set before the encoded data is set to latin1, and the connection character set is set to utf8 when querying.
According to the default settings of the MySQL server when inserting, character_set_client, character_set_connection and character_set_results are all latin1;
The data of the insert operation will go through the character set conversion process of latin1=>latin1=>utf8, and each insert in this process The Chinese characters will be saved from the original 3 bytes to 6 bytes;
The result of the query will go through the character set conversion process of utf8=>utf8, and the saved 6 bytes will be returned intact, resulting in Garbled
Execute the following SQL commands in turn

Share pictures

< p>

We will see the following output

blob

Modify the connection code at this time, and query it

share picture

The output result after the command is executed

 blob

Note: If the encoding of the stored character set is larger than the character set at the time of insertion, the data will be returned intact Garbled codes will appear, but you can modify the query character set to avoid garbled codes, that is, data will not be lost.

3. The ultimate solution to garbled characters

1) First of all, you must clarify what encoding format your client uses. This is the most important (IE6 generally uses utf8, and the command line is generally gbk, the general program is gb2312)
2) Make sure that your database uses utf8 format, it is very simple, all encodings take it all.
3) Be sure to ensure that the connection character set is greater than or equal to the client character set, otherwise information will be lost, such as: latin1 4) If the above three steps are done correctly, then all Chinese are correctly converted into utf8 format and stored in the database. In order to adapt to different browsers and different clients, you can modify character_set_results To display Chinese fonts in different encodings, since utf8 is the general direction, I still tend to display Chinese in utf8 format for web applications.

Prepared by the member community of the Old Nine Academy

Author:naaman

Share Picture

Leave a Comment

Your email address will not be published.