MySQL 非法混合排序规则
查看我的 prod 日志后,我提到了一些错误:
After viewing my prod logs, I have some error mentionning :
[2012-08-31 15:56:43] request.CRITICAL: DoctrineDBALDBALException:
An exception occurred while executing 'SELECT t0.username ....... FROM fos_user t0 WHERE t0.username = ?'
with params {"1":"Nrvu29e7Kasi"}:
SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '='
Alghout 我在学说 cfg 下有 UTF-8 默认值:
Alghout i have UTF-8 default under the doctrine cfg :
doctrine:
dbal:
charset: UTF8
似乎我所有的 MySQL 表都在 latin1_swedish_ci
中,所以我的问题是:
It seems that all my MySQL Tables are in latin1_swedish_ci
, so my question is :
我可以手动将所有表的排序规则更改为 utf8_general_ci
而不需要任何复杂性/预防措施吗?
Can I manually change the collation to utf8_general_ci
for all my tables without any complications/precautions ?
推荐答案
理解以下定义很有帮助:
It is helpful to understand the following definitions:
字符编码详细说明了每个符号如何以二进制表示(并因此存储在计算机中).例如,符号
é
(U+00E9,带锐角的拉丁小写字母 E)是 在 UTF-8 中编码为0xc3a9
(其中MySQL 在 Windows-1252 中调用utf8
) 和0xe9
(MySQL 调用latin1
).
A character encoding details how each symbol is represented in binary (and therefore stored in the computer). For example, the symbol
é
(U+00E9, latin small letter E with acute) is encoded as0xc3a9
in UTF-8 (which MySQL callsutf8
) and0xe9
in Windows-1252 (which MySQL callslatin1
).
字符集是可以使用给定字符编码表示的符号字母表.令人困惑的是,该术语也用于表示与字符编码相同的含义.
A character set is the alphabet of symbols that can be represented using a given character encoding. Confusingly, the term is also used to mean the same as character encoding.
collation 是对字符集的排序,以便可以比较字符串.例如:MySQL 的 latin1_swedish_ci
归类处理字符的大多数重音变体等同于基本字符,而其 latin1_general_ci
排序规则将排序它们在下一个基本字符之前但不等价(还有其他更重要的差异:例如 å
、ä
、ö<等字符的顺序/code> 和
ß
).
A collation is an ordering on a character set, so that strings can be compared. For example: MySQL's latin1_swedish_ci
collation treats most accented variations of a character as equivalent to the base character, whereas its latin1_general_ci
collation will order them before the next base character but not equivalent (there are other, more significant, differences too: such as the order of characters like å
, ä
, ö
and ß
).
MySQL 将决定应将哪种排序规则应用于给定的表达式,如表达式排序规则:特别是,列的排序规则优先于字符串文字的排序规则.
MySQL will decide which collation should be applied to a given expression as documented under Collation of Expressions: in particular, the collation of a column takes precedence over that of a string literal.
查询的 WHERE
子句比较以下字符串:
The WHERE
clause of your query compares the following strings:
fos_user.username
中的一个值,以列的字符集 (Windows-1252) 进行编码,并表示对其排序规则的偏好latin1_swedish_ci
(具有强制性2)的值;与
a value in
fos_user.username
, encoded in the column's character set (Windows-1252) and expressing a preference for its collationlatin1_swedish_ci
(with a coercibility value of 2); with
字符串文字 'Nrv⧧Kasi'
,以连接的字符集(UTF-8,由 Doctrine 配置)编码,并表示对连接的排序规则的偏好 utf8_general_ci
(强制值为 4).
the string literal 'Nrv⧧Kasi'
, encoded in the connection's character set (UTF-8, as configured by Doctrine) and expressing a preference for the connection's collation utf8_general_ci
(with a coercibility value of 4).
由于这些字符串中的第一个具有比第二个更低的强制值,MySQL 尝试使用该字符串的排序规则执行比较:latin1_swedish_ci
.为此,MySQL 尝试将第二个字符串转换为 latin1
—但由于该字符集中不存在 ⧧
字符,因此比较失败.
Since the first of these strings has a lower coercibility value than the second, MySQL attempts to perform the comparison using that string's collation: latin1_swedish_ci
. To do so, MySQL attempts to convert the second string to latin1
—but since the ⧧
character does not exist in that character set, the comparison fails.
应该暂停片刻,考虑当前列的编码方式:您正在尝试过滤 fos_user.username
等于一个字符串的记录,该字符串包含strong>不能存在于该列中!
One should pause for a moment to consider how the column is currently encoded: you are attempting to filter for records where fos_user.username
is equal to a string that contains a character which cannot exist in that column!
如果您认为该列确实包含这样的字符,那么您可能在连接字符编码设置为某些(例如 latin1
)时写入该列,导致MySQL 将接收到的字节序列解释为 Windows-1252 字符集中的字符.
If you believe that the column does contain such characters, then you probably wrote to the column whilst the connection character encoding was set to something (e.g. latin1
) that caused MySQL to interpret the received byte sequence as characters which are all in the Windows-1252 character set.
如果是这种情况,在继续之前,您应该修复您的数据!
If this is the case, before continuing any further you should fix your data!
将此类列转换为用于数据插入的字符编码(如果与现有编码不同):
convert such columns to the character encoding that was used on data insertion, if different to the incumbent encoding:
ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET foo;
通过将这些列转换为 binary
字符集来删除与这些列关联的编码信息:
drop the encoding information associated with such columns by converting them to the binary
character set:
ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET binary;
通过将这些列转换为相关字符集,将这些列与实际传输数据的编码相关联.
associate with such columns the encoding in which data was actually transmitted by converting them to the relevant character set.
ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET bar;
请注意,如果从多字节编码转换,您可能需要增加列的大小(甚至更改其类型)以适应转换后字符串的最大可能长度.
Note that, if converting from a multi-byte encoding, you may need to increase the size of the column (or even change its type) in order to accomodate the maximum possible length of the converted string.
一旦确定列的编码正确,就可以强制使用 Unicode 排序规则进行比较—
Once one is certain that the columns are correctly encoded, one could force the comparison to be conducted using a Unicode collation by either—
将值
fos_user.username
显式转换为 Unicode 字符集:
explicitly converting the value
fos_user.username
to a Unicode character set:
WHERE CONVERT(fos_user.username USING utf8) = ?
强制字符串文字具有比列更低的强制值(将导致列值隐式转换为 UTF-8):
forcing the string literal to have a lower coercibility value than the column (will cause an implicit conversion of the column's value to UTF-8):
WHERE fos_user.username = ? COLLATE utf8_general_ci
或者,如您所说,可以将列永久转换为 Unicode 编码并适当设置其排序规则.
Or one could, as you say, permanently convert the column(s) to a Unicode encoding and set its collation appropriately.
我可以手动将所有表的排序规则更改为 utf8_general_ci
而不需要任何复杂性/预防措施吗?
Can I manually change the collation to
utf8_general_ci
for all my tables without any complications/precautions ?
主要考虑是Unicode编码比单字节字符集占用更多空间,因此:
The principle consideration is that Unicode encodings take up more space than single-byte character sets, so:
可能需要更多存储空间;
more storage may be required;
比较可能会更慢;和
索引前缀长度可能需要调整(注意最大值以字节为单位,因此可能比以前表示的字符更少).
index prefix lengths may need to be adjusted (note that the maximum is in bytes, so may represent fewer characters than previously).
另外,请注意,如ALTER TABLE
语法一个>:
Also, be aware that, as documented under ALTER TABLE
Syntax:
更改表默认字符集和所有字符列(CHAR
、VARCHAR
、TEXT
) 到一个新的字符集,使用这样的语句:
To change the table default character set and all character columns (
CHAR
,VARCHAR
,TEXT
) to a new character set, use a statement like this:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
对于数据类型为 VARCHAR
或一个的列TEXT
类型,CONVERT TO CHARACTER SET
将根据需要更改数据类型,以确保新列的长度足以存储与原始列一样多的字符.例如,一个 TEXT
列有两个长度字节,用于存储字节- 列中值的长度,最多为 65,535.对于 latin1
TEXT
列,每个字符需要一个字节,因此该列最多可以存储 65,535 个字符.如果将列转换为 utf8
,则每个字符可能最多需要三个字节,最大可能长度为 3 × 65,535 = 196,605 个字节.该长度不适合 TEXT
列的长度字节,因此 MySQL 将将数据类型转换为MEDIUMTEXT
,这是最小的字符串类型长度字节可以记录一个值为 196,605.同样,VARCHAR
列可能会转换为 MEDIUMTEXT
.
For a column that has a data type of VARCHAR
or one of the TEXT
types, CONVERT TO CHARACTER SET
will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT
column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1
TEXT
column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8
, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT
column's length bytes, so MySQL will convert the data type to MEDIUMTEXT
, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR
column might be converted to MEDIUMTEXT
.
为了避免刚刚描述的类型的数据类型更改,不要使用CONVERT TO CHARACTER SET
.相反,使用 MODIFY
来更改单个列.
To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET
. Instead, use MODIFY
to change individual columns.
相关文章