MySQL 区分 e 和 é(e 急性)——UNIQUE 索引

2021-12-21 00:00:00 indexing constraints character-encoding mysql

我有一个表，students，有 3 列:id、name 和 age.我在 name 和 age 列上有一个 UNIQUE 索引 Index_2.

I have a table, students, with 3 columns: id, name, and age. I have a UNIQUE index Index_2 on columns name and age.

CREATE TABLE `bedrock`.`students` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, `name` VARCHAR(45) NOT NULL, `age` INTEGER UNSIGNED NOT NULL, PRIMARY KEY (`id`), UNIQUE INDEX `Index_2` USING BTREE(`name`, `age`) ) ENGINE = InnoDB;

我试过这个插入选项:

insert into students (id, name, age) values (1, 'Ane', 23);

工作正常.比我试过这个(见 Ané - e 急性):

which works ok. Than I've tried this one (see Ané - e acute):

insert into students (id, name, age) values (2, 'Ané', 23);

并且我收到此错误消息:

and I receive this error message:

"Duplicate entry 'Ané-23' for key 'Index_2'"

MySQL 不知何故没有区分Ane"和Ané".我如何解决这个问题以及为什么会这样?

MySQL somehow does not make any distinction between "Ane" and "Ané". How I can resolve this and why this is happening?

表格学生的字符集是utf8"，排序规则是utf8_general_ci".

Charset for table students is "utf8" and collation is "utf8_general_ci".

ALTER TABLE `students` CHARACTER SET utf8 COLLATE utf8_general_ci;

后期@Crozin:

我已更改为使用整理 utf8_bin:

I've changed to use collation utf8_bin:

ALTER TABLE `students` CHARACTER SET utf8 COLLATE utf8_bin;

但我收到同样的错误.

但是如果我从字符集 utf8 和整理 utf8_bin 开始创建表，就像这样:

But if I create the table from start with charset utf8 and collation utf8_bin, like this:

CREATE TABLE `students2` ( `id` INTEGER UNSIGNED AUTO_INCREMENT, `name` VARCHAR(45), `age` VARCHAR(45), PRIMARY KEY (`id`), UNIQUE INDEX `Index_2` USING BTREE(`name`, `age`) ) ENGINE = InnoDB CHARACTER SET utf8 COLLATE utf8_bin;

以下两个插入命令都可以正常工作:

both below insert commands works ok:

insert into students2 (id, name, age) values (1, 'Ane', 23); // works ok insert into students2 (id, name, age) values (2, 'Ané', 23); // works ok

这似乎很奇怪.

后期编辑 2:

我在这里看到了另一个答案.我不确定用户是删除了还是丢失了.我只是在测试它:

I saw another answer here. I'm not sure if the user deleted or it get lost. I was just testing it:

用户写道，他首先用 3 个不同的字符集创建了 3 个表:

The user wrote that first he created 3 tables with 3 different charsets:

CREATE TABLE `utf8_bin` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(45) COLLATE utf8_bin NOT NULL, `age` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `Index_2` (`name`,`age`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin; CREATE TABLE `utf8_unicode_ci` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(45) COLLATE utf8_unicode_ci NOT NULL, `age` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `Index_2` (`name`,`age`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; CREATE TABLE `utf8_general_ci` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(45) COLLATE utf8_general_ci NOT NULL, `age` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `Index_2` (`name`,`age`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

用户的结果是:

Insert commands: INSERT INTO utf8_bin VALUES (1, 'Ane', 23), (2, 'Ané', 23); Query OK, 2 rows affected (0.02 sec) Records: 2 Duplicates: 0 Warnings: 0 INSERT INTO utf8_unicode_ci VALUES (1, 'Ane', 23), (2, 'Ané', 23); Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0 INSERT INTO utf8_general_ci VALUES (1, 'Ane', 23), (2, 'Ané', 23); Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0

这是我的结果:

INSERT INTO utf8_bin VALUES (1, 'Ane', 23), (2, 'Ané', 23); //works ok INSERT INTO utf8_unicode_ci VALUES (1, 'Ane', 23), (2, 'Ané', 23); // Duplicate entry 'Ané-23' for key 'Index_2' INSERT INTO utf8_general_ci VALUES (1, 'Ane', 23), (2, 'Ané', 23); //Duplicate entry 'Ané-23' for key 'Index_2'

我不知道为什么他的这个 INSERT 命令有效，而对我来说却不起作用.

I'm not sure why in his part this INSERT command worked and for me doesn't work.

他还写道，他在 Linux 上的 Mysql 上对此进行了测试 - 必须对此做些什么?！即使我不这么认为.

He also wrote that he tested this on Mysql on Linux - has to do something with this?! Even I do not think so.

推荐答案

整理是utf8_general_ci".

and collation is "utf8_general_ci".

这就是答案.如果您使用的是 utf8_general_ci(实际上它适用于所有 utf_..._[ci|cs])排序规则，那么变音符号会在 comarison 中被绕过，因此:

And that's the answer. If you're using utf8_general_ci (actually it applies to all utf_..._[ci|cs]) collation then diacritics are bypassed in comarison, thus:

SELECT "e" = "é" AND "O" = "Ó" AND "ä" = "a"

结果为 1.索引也使用排序规则.

Results in 1. Indexes also use collation.

如果你想区分 ą 和 a 然后使用 utf8_bin 排序规则(记住它也区分大写和小写字符).

If you want to distinguish between ą and a then use utf8_bin collation (keep in mind that it also distinguish between uppercase and lowercase characters).

顺便说一下，姓名和年龄不保证任何唯一性.

By the way name and age don't guarantee any uniqueness.

相关文章