参考:为什么我的“特别"?使用 json_encode 编码奇怪的 Unicode 字符?
当使用特殊"Unicode 字符时,它们在编码为 JSON 时会变成奇怪的垃圾:
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"u99ac"}
为什么?我的编码有问题吗?
Why? Have I done something wrong with my encodings?
(这是一个一劳永逸地澄清主题的参考问题,因为这会一次又一次地出现.)
推荐答案
首先:这里没有错.这就是字符可以用JSON编码的方式.它在 官方 标准.它基于 Javascript ECMAScript (第 7.8.4 节字符串字面量") 并且这样描述:
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
任何代码点都可以表示为十六进制数.此类数字的含义由 ISO/IEC 10646 确定.如果代码点位于基本多语言平面(U+0000 到 U+FFFF),则它可以表示为六个字符的序列:反斜线,后跟小写字母 u,后跟四个编码代码点的十六进制数字.[...] 因此,例如,一个仅包含一个反斜线字符的字符串可以表示为u005C".
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "u005C".
简而言之:任何字符都可以编码为u....
,其中....
是字符的Unicode代码点(或代码UTF-16 代理对的一半的点,用于 BMP 之外的字符).
In short: Any character can be encoded as u....
, where ....
is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"u99ac"
这两个字符串字面量代表完全相同的字符,它们是绝对等价的.当这些字符串文字由兼容的 JSON 解析器解析时,它们都会产生字符串马".它们看起来不同,但它们在 JSON 数据编码格式中意思相同.
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP 的 json_encode
最好使用 u 对非 ASCII 字符进行编码....
转义序列.从技术上讲,它不是必须的,但它确实如此.结果是完全有效的.如果您更喜欢在 JSON 中使用文字字符而不是转义序列,则可以在 PHP 5.4 或更高版本中设置 JSON_UNESCAPED_UNICODE
标志:
PHP's json_encode
preferably encodes non-ASCII characters using u....
escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE
flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
强调一下:这只是一个偏好,没有必要以任何方式在 JSON 中传输Unicode 字符".
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.
相关文章