MySQL 区分 e 和 é(e 急性)——UNIQUE 索引
我有一个表,students
,有 3 列:id
、name
和 age
.我在 name
和 age
列上有一个 UNIQUE
索引 Index_2
.
I have a table, students
, with 3 columns: id
, name
, and age
.
I have a UNIQUE
index Index_2
on columns name
and age
.
CREATE TABLE `bedrock`.`students` (
`id` INTEGER UNSIGNED NOT NULL
AUTO_INCREMENT, `name` VARCHAR(45)
NOT NULL, `age` INTEGER UNSIGNED NOT
NULL, PRIMARY KEY (`id`), UNIQUE
INDEX `Index_2` USING BTREE(`name`,
`age`) ) ENGINE = InnoDB;
我试过这个插入选项:
insert into students (id, name, age)
values (1, 'Ane', 23);
工作正常.比我试过这个(见 Ané - e 急性):
which works ok. Than I've tried this one (see Ané - e acute):
insert into students (id, name, age)
values (2, 'Ané', 23);
并且我收到此错误消息:
and I receive this error message:
"Duplicate entry 'Ané-23' for key 'Index_2'"
MySQL 不知何故没有区分Ane"和Ané".我如何解决这个问题以及为什么会这样?
MySQL somehow does not make any distinction between "Ane" and "Ané". How I can resolve this and why this is happening?
表格学生的字符集是utf8",排序规则是utf8_general_ci".
Charset for table students is "utf8" and collation is "utf8_general_ci".
ALTER TABLE `students` CHARACTER SET utf8 COLLATE utf8_general_ci;
后期@Crozin:
我已更改为使用整理 utf8_bin:
I've changed to use collation utf8_bin:
ALTER TABLE `students`
CHARACTER SET utf8 COLLATE utf8_bin;
但我收到同样的错误.
但是如果我从字符集 utf8 和整理 utf8_bin 开始创建表,就像这样:
But if I create the table from start with charset utf8 and collation utf8_bin, like this:
CREATE TABLE `students2` (
`id` INTEGER UNSIGNED AUTO_INCREMENT,
`name` VARCHAR(45), `age`
VARCHAR(45), PRIMARY KEY (`id`),
UNIQUE INDEX `Index_2` USING
BTREE(`name`, `age`) ) ENGINE = InnoDB
CHARACTER SET utf8 COLLATE utf8_bin;
以下两个插入命令都可以正常工作:
both below insert commands works ok:
insert into students2 (id, name, age)
values (1, 'Ane', 23); // works ok
insert into students2 (id, name, age)
values (2, 'Ané', 23); // works ok
这似乎很奇怪.
后期编辑 2:
我在这里看到了另一个答案.我不确定用户是删除了还是丢失了.我只是在测试它:
I saw another answer here. I'm not sure if the user deleted or it get lost. I was just testing it:
用户写道,他首先用 3 个不同的字符集创建了 3 个表:
The user wrote that first he created 3 tables with 3 different charsets:
CREATE TABLE `utf8_bin` ( `id`
int(10) unsigned NOT NULL
AUTO_INCREMENT, `name` varchar(45)
COLLATE utf8_bin NOT NULL, `age`
int(10) unsigned NOT NULL, PRIMARY
KEY (`id`), UNIQUE KEY `Index_2`
(`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_bin;
CREATE TABLE `utf8_unicode_ci` (
`id` int(10) unsigned NOT NULL
AUTO_INCREMENT, `name` varchar(45)
COLLATE utf8_unicode_ci NOT NULL,
`age` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`), UNIQUE KEY
`Index_2` (`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;
CREATE TABLE `utf8_general_ci` (
`id` int(10) unsigned NOT NULL
AUTO_INCREMENT, `name` varchar(45)
COLLATE utf8_general_ci NOT NULL,
`age` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`), UNIQUE KEY
`Index_2` (`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_general_ci;
用户的结果是:
Insert commands: INSERT INTO utf8_bin
VALUES (1, 'Ane', 23), (2, 'Ané', 23);
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
INSERT INTO utf8_unicode_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); Query OK,
2 rows affected (0.01 sec) Records: 2
Duplicates: 0 Warnings: 0
INSERT INTO utf8_general_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); Query OK,
2 rows affected (0.01 sec) Records: 2
Duplicates: 0 Warnings: 0
这是我的结果:
INSERT INTO utf8_bin VALUES (1, 'Ane',
23), (2, 'Ané', 23); //works ok
INSERT INTO utf8_unicode_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); //
Duplicate entry 'Ané-23' for key
'Index_2'
INSERT INTO utf8_general_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23);
//Duplicate entry 'Ané-23' for key
'Index_2'
我不知道为什么他的这个 INSERT
命令有效,而对我来说却不起作用.
I'm not sure why in his part this INSERT
command worked and for me doesn't work.
他还写道,他在 Linux 上的 Mysql 上对此进行了测试 - 必须对此做些什么?!即使我不这么认为.
He also wrote that he tested this on Mysql on Linux - has to do something with this?! Even I do not think so.
推荐答案
整理是utf8_general_ci".
and collation is "utf8_general_ci".
这就是答案.如果您使用的是 utf8_general_ci
(实际上它适用于所有 utf_..._[ci|cs]
)排序规则,那么变音符号会在 comarison 中被绕过,因此:
And that's the answer. If you're using utf8_general_ci
(actually it applies to all utf_..._[ci|cs]
) collation then diacritics are bypassed in comarison, thus:
SELECT "e" = "é" AND "O" = "Ó" AND "ä" = "a"
结果为 1
.索引也使用排序规则.
Results in 1
. Indexes also use collation.
如果你想区分 ą
和 a
然后使用 utf8_bin
排序规则(记住它也区分大写和小写字符).
If you want to distinguish between ą
and a
then use utf8_bin
collation (keep in mind that it also distinguish between uppercase and lowercase characters).
顺便说一下,姓名和年龄不保证任何唯一性.
By the way name and age don't guarantee any uniqueness.
相关文章