Unicode 支持需要 wchar_t 吗?
Unicode 支持是否需要 wchar_t
类型?如果不是,那么这种多字节类型的意义何在?当你可以用 char
完成同样的事情时,你为什么要使用 wchar_t?
Is the wchar_t
type required for unicode support? If not then what's the point of this multibyte type? Why would you use wchar_t when you could accomplish the same thing with char
?
推荐答案
No.
从技术上讲,没有.Unicode 是一种定义代码点的标准,它不需要特定的编码.
No.
Technically, no. Unicode is a standard that defines code points and it does not require a particular encoding.
因此,您可以将 unicode 与 UTF-8 编码一起使用,然后所有内容都可以放入一个或一小段 char
对象序列中,并且它甚至仍会以 null 结尾.
So, you could use unicode with the UTF-8 encoding and then everything would fit in a one or a short sequence of char
objects and it would even still be null-terminated.
UTF-8 和 UTF-16 的问题在于 s[i]
不一定是一个字符,它可能只是一个字符的一部分,而对于足够宽的字符,您可以保留 s[i]
是单个字符的抽象,但它不会使 strings 在各种转换下成为固定长度.
The problem with UTF-8 and UTF-16 is that s[i]
is not necessarily a character any more, it might be just a piece of one, whereas with sufficiently wide characters you can preserve the abstraction that s[i]
is a single character, tho it does not make strings fixed-length under various transformations.
32 位整数至少足够宽以解决代码点问题,但它们仍然无法处理极端情况,例如,大写某些内容可以改变字符数.
32-bit integers are at least wide enough to solve the code point problem but they still don't handle corner cases, e.g., upcasing something can change the number of characters.
所以结果证明x[i]
问题即使通过char32_t也没有完全解决,而那些其他编码的文件格式很差.
So it turns out that the x[i]
problem is not completely solved even by char32_t, and those other encodings make poor file formats.
那么,您暗示的观点是非常有效的:wchar_t
是一个失败,部分原因是 Windows 仅将其设置为 16 位,部分原因是它没有解决所有问题,并且与字节流抽象.
Your implied point, then, is quite valid: wchar_t
is a failure, partly because Windows made it only 16 bits, and partly because it didn't solve every problem and was horribly incompatible with the byte stream abstraction.
相关文章