使用二进制排序规则有什么影响?
在回答这个问题时,我对一些我无法找到足够答案的事情变得不确定.
While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.
使用二进制 utf8_bin
和不区分大小写的 utf8_general_ci
排序规则之间的实际区别是什么?
What are the practical differences between using the binary utf8_bin
and the case insensitive utf8_general_ci
collations?
我能看到三个:
两者的排序顺序不同;
_bin
的排序顺序可能会将任何变音符号放在字母表的末尾,因为比较字节值(对吗?)
Both have a different sorting order;
_bin
's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)
_bin
中仅区分大小写的搜索
Only case sensitive searches in _bin
_bin
还有其他需要注意的差异或副作用吗?
Are there any other differences or side-effects to be aware of?
参考:
- 9.1.2.MySQL 中的字符集和排序规则
- 9.1.7.6.mySQL 手册中的 _bin 和二进制排序规则
- 9.1.7.7.二进制运算符
未解决问题的类似问题:
Similar questions that don't address the issue:
- UTF-8:通用?斌?统一码?
推荐答案
如果字符不同(无论是大小写还是变音符号),二进制排序规则将与 C 中的 strcmp()
完全一样比较您的字符串区别).它的缺点是排序顺序不自然.
Binary collation compares your string exactly as strcmp()
in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.
不自然排序顺序的示例(如二进制"中的):A,B,a,b
在这种情况下,自然排序顺序将是例如:A,a,B,b
(同一字母的小写和大写变体彼此相邻排序)
An example of unnatural sort order (as in "binary" is) : A,B,a,b
Natural sort order would be in this case e.g : A,a,B,b
(small and capital variations of the same letter are sorted next to each other)
二进制整理的实际优势在于它的速度,因为字符串比较非常简单/快速.在一般情况下,二进制索引可能不会产生预期的排序结果,但对于精确匹配它们可能很有用.
The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.
相关文章