使用二进制排序规则有什么影响?

2021-11-20 00:00:00 binary database mysql collation string-comparison

在回答这个问题时，我对一些我无法找到足够答案的事情变得不确定.

While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.

使用二进制 utf8_bin 和不区分大小写的 utf8_general_ci 排序规则之间的实际区别是什么?

What are the practical differences between using the binary utf8_bin and the case insensitive utf8_general_ci collations?

我能看到三个:

两者的排序顺序不同；_bin 的排序顺序可能会将任何变音符号放在字母表的末尾，因为比较字节值(对吗?)

Both have a different sorting order; _bin's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)

_bin 中仅区分大小写的搜索

Only case sensitive searches in _bin

_bin

还有其他需要注意的差异或副作用吗?

Are there any other differences or side-effects to be aware of?

参考:

9.1.2.MySQL 中的字符集和排序规则
9.1.7.6.mySQL 手册中的 _bin 和二进制排序规则
9.1.7.7.二进制运算符

未解决问题的类似问题:

Similar questions that don't address the issue:

UTF-8:通用?斌?统一码?

推荐答案

如果字符不同(无论是大小写还是变音符号)，二进制排序规则将与 C 中的 strcmp() 完全一样比较您的字符串区别).它的缺点是排序顺序不自然.

Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.

不自然排序顺序的示例(如二进制"中的):A,B,a,b在这种情况下，自然排序顺序将是例如:A,a,B,b(同一字母的小写和大写变体彼此相邻排序)

An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the same letter are sorted next to each other)

二进制整理的实际优势在于它的速度，因为字符串比较非常简单/快速.在一般情况下，二进制索引可能不会产生预期的排序结果，但对于精确匹配它们可能很有用.

The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.

相关文章