字符集和排序规则到底是什么意思?

2021-11-20 00:00:00 database mysql database-design character-set

我可以阅读 MySQL 文档，而且非常清楚.但是，如何决定使用哪个字符集?整理对哪些数据有影响?

I can read the MySQL documentation and it's pretty clear. But, how does one decide which character set to use? On what data does collation have an effect?

我要求解释这两者以及如何选择它们.

I'm asking for an explanation of the two and how to choose them.

推荐答案

来自 MySQL 文档:

字符集是一组符号和编码.collation 是一组比较字符的规则字符集.让我们做用一个例子区分清楚一个虚构的字符集.

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

假设我们有一个字母表四个字母:A"、B"、a"、b".我们给每个字母一个数字:'A' = 0，'B' = 1, 'a' = 2, 'b' = 3. 字母'A' 是一个符号，数字 0 是'A' 的编码，以及组合所有四个字母及其encodings 是一个字符集.

Suppose that we have an alphabet with four letters: 'A', 'B', 'a', 'b'. We give each letter a number: 'A' = 0, 'B' = 1, 'a' = 2, 'b' = 3. The letter 'A' is a symbol, the number 0 is the encoding for 'A', and the combination of all four letters and their encodings is a character set.

现在，假设我们要比较两个字符串值，A"和B".这最简单的方法是查看编码:0 代表A"，1 代表'乙'.因为 0 小于 1，所以我们说A"小于B".现在，我们所拥有的刚刚完成的是对我们的字符集.整理是一个集合规则(在这种情况下只有一个规则):比较编码."我们称之为所有可能的排序规则中最简单的一个二进制整理.

Now, suppose that we want to compare two string values, 'A' and 'B'. The simplest way to do this is to look at the encodings: 0 for 'A' and 1 for 'B'. Because 0 is less than 1, we say 'A' is less than 'B'. Now, what we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): "compare the encodings." We call this simplest of all possible collations a binary collation.

但是如果我们想说小写和大写字母是相等的?那么我们将在至少有两条规则:(1)对待小写字母 'a' 和 'b' 作为相当于A"和B"；(2) 那么比较编码.我们称之为不区分大小写的排序规则.它是比二进制复杂一点整理.

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters 'a' and 'b' as equivalent to 'A' and 'B'; (2) then compare the encodings. We call this a case-insensitive collation. It's a little more complex than a binary collation.

在现实生活中，大多数字符集都有许多字符:不仅仅是A"和B"但是整个字母表，有时多个字母或东方文字具有数千个字符的系统，以及许多特殊符号和标点符号.在现实生活中，大多数排序规则有很多规则:不只是不区分大小写，但也口音不敏感(口音"是附加到字符的标记，如德语 'ö') 和多字符映射(例如ö"='OE' 在两个德语之一中排序规则).

In real life, most character sets have many characters: not just 'A' and 'B' but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules: not just case insensitivity but also accent insensitivity (an "accent" is a mark attached to a character as in German 'ö') and multiple-character mappings (such as the rule that 'ö' = 'OE' in one of the two German collations).

相关文章