MariaDB/MySQL 中 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则之间的区别?

2022-01-15 00:00:00 unicode mariadb mysql collation

我登录 MariaDB/MySQL 并输入:

I logged into MariaDB/MySQL and entered:

SHOW COLLATION;

我在可用的排序规则中看到 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci.这两个排序规则有什么区别，我们应该使用哪个排序规则?

I see utf8mb4_unicode_ci and utf8mb4_unicode_520_ci among the available collations. What is the difference between these two collations and which should we be using?

推荐答案

好吧，您可以阅读文档中的差异.我不能告诉你应该使用什么，因为每个项目都不一样.

Well, you can read about the differences in the documentation. I can't tell you what you should be using because every project is different.

MySQL 排序规则名称遵循以下约定:

10.1.3 Collation Naming Conventions

MySQL collation names follow these conventions:

排序规则名称以字符集的名称开头它是关联的，后跟一个或多个表示其他的后缀整理特征.例如，utf8_general_ci 和latin_swedish_ci 是 utf8 和 latin1 字符的排序规则分别设置.

A collation name starts with the name of the character set with which it is associated, followed by one or more suffixes indicating other collation characteristics. For example, utf8_general_ci and latin_swedish_ci are collations for the utf8 and latin1 character sets, respectively.

特定于语言的排序规则包括语言名称.例如，utf8_turkish_ci 和 utf8_hungarian_ci 为 utf8 排序字符分别使用土耳其语和匈牙利语规则的字符集.

A language-specific collation includes a language name. For example, utf8_turkish_ci and utf8_hungarian_ci sort characters for the utf8 character set using the rules of Turkish and Hungarian, respectively.

排序区分大小写由_ci(不区分大小写)表示，_cs(区分大小写)或 _bin(二进制；字符比较基于字符二进制代码值).例如，latin1_general_ci 是不区分大小写，latin1_general_cs 区分大小写，latin1_bin使用二进制代码值.

Case sensitivity for sorting is indicated by _ci (case insensitive), _cs (case sensitive), or _bin (binary; character comparisons are based on character binary code values). For example, latin1_general_ci is case insensitive, latin1_general_cs is case sensitive, and latin1_bin uses binary code values.

对于 Unicode，排序规则名称可能包含版本号以指示Unicode 排序算法 (UCA) 的版本，整理是基于.没有版本号的基于 UCA 的排序规则该名称使用版本 4.0.0 UCA 权重键.例如:

For Unicode, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:

utf8_unicode_ci(未命名版本)基于 UCA 4.0.0 权重键 >(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).

utf8_unicode_ci (with no version named) is based on UCA 4.0.0 weight keys >(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).

utf8_unicode_520_ci 基于 UCA 5.2.0 权重键(http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).

utf8_unicode_520_ci is based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).

对于 Unicode，xxx_general_mysql500_ci 排序规则保留5.1.24 之前的原始 xxx_general_ci 排序规则和允许升级在 MySQL 5.1.24 之前创建的表.更多信息，请参阅第 2.11.3 节，检查表或索引必须重建"和第 2.11.4 节，重建或修复表或索引".

For Unicode, the xxx_general_mysql500_ci collations preserve the pre-5.1.24 ordering of the original xxx_general_ci collations and permit upgrades for tables created before MySQL 5.1.24. For more information, see Section 2.11.3, "Checking Whether Tables or Indexes Must Be Rebuilt", and Section 2.11.4, "Rebuilding or Repairing Tables or Indexes".

来源

相关文章