Unicode 支持需要 wchar_t 吗?

2021-12-26 00:00:00 unicode c c++

Unicode 支持是否需要 wchar_t 类型?如果不是,那么这种多字节类型的意义何在?当你可以用 char 完成同样的事情时,你为什么要使用 wchar_t?

Is the wchar_t type required for unicode support? If not then what's the point of this multibyte type? Why would you use wchar_t when you could accomplish the same thing with char?



从技术上讲,没有.Unicode 是一种定义代码点的标准,它不需要特定的编码.


Technically, no. Unicode is a standard that defines code points and it does not require a particular encoding.

因此,您可以将 unicode 与 UTF-8 编码一起使用,然后所有内容都可以放入一个或一小段 char 对象序列中,并且它甚至仍会以 null 结尾.

So, you could use unicode with the UTF-8 encoding and then everything would fit in a one or a short sequence of char objects and it would even still be null-terminated.

UTF-8 和 UTF-16 的问题在于 s[i] 不一定是一个字符,它可能只是一个字符的一部分,而对于足够宽的字符,您可以保留 s[i] 是单个字符的抽象,但它不会使 strings 在各种转换下成为固定长度.

The problem with UTF-8 and UTF-16 is that s[i] is not necessarily a character any more, it might be just a piece of one, whereas with sufficiently wide characters you can preserve the abstraction that s[i] is a single character, tho it does not make strings fixed-length under various transformations.

32 位整数至少足够宽以解决代码点问题,但它们仍然无法处理极端情况,例如,大写某些内容可以改变字符数.

32-bit integers are at least wide enough to solve the code point problem but they still don't handle corner cases, e.g., upcasing something can change the number of characters.


So it turns out that the x[i] problem is not completely solved even by char32_t, and those other encodings make poor file formats.

那么,您暗示的观点是非常有效的:wchar_t 是一个失败,部分原因是 Windows 仅将其设置为 16 位,部分原因是它没有解决所有问题,并且与字节流抽象.

Your implied point, then, is quite valid: wchar_t is a failure, partly because Windows made it only 16 bits, and partly because it didn't solve every problem and was horribly incompatible with the byte stream abstraction.
