UTF8_BIN with UTF8_GENERAL_CI. Which sort rule provides faster performance in the MySQL table?

I use the’id’ field – char(22) as the primary key of the MySQL table.
This field is only used to filter unique IDs when adding new users to the table.

For me, it is not important to use utf8_bin or utf8_general_ci to organize, because the condition of the letters does not matter, I only use English letters as id.

The only problem is:
Cleaning will provide faster performance?

The most common query for my table is:

LOAD DATA INFILE ... IGNORE INTO TABLE mytable(id)
or
INSERT IGNORE INTO mytable(id)...
and
SELECT COUNT(id) FROM mytable

Now, I don’t see any difference in performance because the table is not big, But what happens when the number of rows in my table exceeds 2 million?

Will the utf8_general_ci collation provide faster performance for INSERT or LOAD IGNORE and SELECT COUNT queries?

Generally, utf8_bin will be at least as fast as utf8_general_ci, because in addition to recognizing multibyte characters, It does not perform any processing on binary data.

That is to say, the fact that there is an index on the id column and the fact that you only want to detect duplication and not sort should mean that there is absolutely nothing to detect Difference. However, this is just an educated guess, so I may be wrong (even if it seems unlikely).

I use the’id’ field – char(22) is used as the primary key of the MySQL table.
This field is only used to filter unique IDs when adding new users to the table.

For me, use utf8_bin or utf8_general_ci to organize Is not important, because the condition of the letters does not matter, I only use English letters as the id.

The only question is:
Will collation provide faster performance?

The most common query for my table is:

LOAD DATA INFILE ... IGNORE INTO TABLE mytable(id)
or
INSERT IGNORE INTO mytable(id)...
and
SELECT COUNT(id) FROM mytable

Now, I don’t see any difference in performance because the table is not big, But what happens when the number of rows in my table exceeds 2 million?

Will the utf8_general_ci collation provide faster performance for INSERT or LOAD IGNORE and SELECT COUNT queries?

Usually, utf8_bin will be at least as fast as utf8_general_ci, because in addition to recognizing multibyte characters, it does not perform any processing on binary data. < p>

That is, the fact that there is an index on the id column and the fact that you only want to detect duplication and not sorting should mean that there is absolutely no detectable difference. However, this is just an educated guess, So I may be wrong (even if it seems unlikely).

Leave a Comment

Your email address will not be published.