为什么表 CHARSET 设置为 utf8mb4 而 COLLATION 设置为 utf8mb4_unicode_520_ci

我最近注意到,每当我开始一个新的 WordPress 项目时,我的表的排序规则会自动从 utf8_unicode_ci(我在从 phpMyAdmin 创建新数据库时选择)更改为 utf8mb4_unicode_520_ci.

I've recently noticed that, when ever I start a new WordPress project, my tables' collation automatically changes from utf8_unicode_ci (which I select when I create a new DB from phpMyAdmin) to utf8mb4_unicode_520_ci.

此外,我在 phpMyAdmin 中的常规设置"下注意到服务器连接排序规则默认为 utf8mb4_unicode_520_ci.

Also, I've noticed in phpMyAdmin under "General Settings" that server connection Collation defaults to utf8mb4_unicode_520_ci.

我在 Ubuntu 17.04 上运行 MySQL Server 5.7.17 和 phpMyAdmin 4.6.6.

I'm running MySQL Server 5.7.17 and phpMyAdmin 4.6.6 on Ubuntu 17.04.

我的问题如下:

  1. 为什么会这样?
  2. 如果可能,我该如何防止这种情况发生?由于 utf8mb4,我在将 WP 站点迁移到不支持它的旧 MySQL 服务器时遇到了问题.
  3. 第 2 点是否可取?使用字符集 utf8mb4 优于 utf8 和归类 utf8mb4_unicode_520_ci 优于 utf8_unicode_ci 有什么好处吗?
  1. Why is this happening?
  2. If possible, how do I prevent this? Because of utf8mb4 I've experienced problems when migrating WP sites to an older MySQL server which does not support it.
  3. Is point 2. advisable? Are there any benefits in using charset utf8mb4 over utf8, and collation utf8mb4_unicode_520_ci over utf8_unicode_ci?

推荐答案

过去只有utf8在未来,utf8mb4将是默认字符集.现在utf8mb4是默认字符集.

In the past, there was only utf8; in the future, utf8mb4 will be the default character set. now utf8mb4 is the default character set.

过去,_general_ci 是默认的排序规则;然后 _unicode_ci (Unicode 4.0) 更好,然后是 _unicode_520_ci (Unicode 5.20).未来(MySQL 8.0)默认为_0900_ci_ai(Unicode 9.0).

In the past, _general_ci was the default collation; then _unicode_ci (Unicode 4.0) was better, then _unicode_520_ci (Unicode 5.20). In the future (MySQL 8.0), the default will be _0900_ci_ai (Unicode 9.0).

与此同时,道路上到处都是由 MySQL 过去的错误所产生的坑洼.WP 设计师驾驶着一个大水箱,没有注意到坑洼.

Meanwhile, the road is full of potholes generated by MySQL's past mistakes. And WP designers are driving in a big tank that does not notice the potholes.

MySQL 5.6 是一个很大的坑,吞掉了许多 WP 用户,因为索引的 767 限制以及过长的 VARCHAR(255) 上的 WP 索引以及使用 utf8mb4.有了 5.7.17,您已经过了它.(您以后迁移到 8.0 将不会那么坎坷.)

MySQL 5.6 was a big pothole that swallowed up many a WP user because of a 767 limit on indexes together with WP indexes on the overly-long VARCHAR(255) and the possibility of using utf8mb4. You are well past it by having 5.7.17. (Your future move to 8.0 will be less bumpy.)

也就是说,在 5.7.7+ 上新创建的数据库/表/列应该不会遇到 767 问题,但是从旧版本 (5.5.3+) 迁移的东西可能会出现问题,尤其是如果某些事情导致您更改为 utf8mb4.

That is, newly created databases/tables/columns on 5.7.7+ should not experience the 767 problem, but things migrated from older versions (5.5.3+) may have issues, especially if something causes you to change to utf8mb4.

怎么办?我可能会用尽空间试图拼出所有选项.所以提供数据的历史、升级路径(如果有的话)、当前设置、表的ROW_FORMATCHARACTER SETCOLLATIONSHOW VARIABLES LIKE 'char%';

What to do? I'll probably run out of space trying to spell out all the options. So provide the history of the data, the upgrade path (if any), the current settings, the ROW_FORMAT of the tables, the CHARACTER SET and COLLATION of the columns, the output of SHOW VARIABLES LIKE 'char%';

你应该在哪里?对于 5.7.7+,utf8mb4utf8mb4_unicode_520_ci 只要可行.该字符集为您提供表情符号和所有中文(utf8 没有).该归类是最好的,尽管您可能很难注意到它的重要性.

Where should you be? For 5.7.7+, utf8mb4 and utf8mb4_unicode_520_ci wherever practical. That charset gives you Emoji and all of Chinese (utf8 does not). That collation is the best available, although you might be hard pressed to notice where it matters.

注意:排序规则名称的第一部分是唯一可以使用的字符集.即 utf8_unicode_ci 不适用于 utf8mb4.

Note: the first part of the collation name is the only character set that it works with. That is utf8_unicode_ci does not work with utf8mb4.

相关文章