Unicode 支持需要 wchar_t 吗?

2021-12-26 00:00:00 unicode c c++

Unicode 支持是否需要 wchar_t 类型?如果不是,那么这种多字节类型的意义何在?当你可以用 char 完成同样的事情时,你为什么要使用 wchar_t?

Is the wchar_t type required for unicode support? If not then what's the point of this multibyte type? Why would you use wchar_t when you could accomplish the same thing with char?

推荐答案

No.

从技术上讲,没有.Unicode 是一种定义代码点的标准,它不需要特定的编码.

No.

Technically, no. Unicode is a standard that defines code points and it does not require a particular encoding.

因此,您可以将 unicode 与 UTF-8 编码一起使用,然后所有内容都可以放入一个或一小段 char 对象序列中,并且它甚至仍会以 null 结尾.

So, you could use unicode with the UTF-8 encoding and then everything would fit in a one or a short sequence of char objects and it would even still be null-terminated.

UTF-8 和 UTF-16 的问题在于 s[i] 不一定是一个字符,它可能只是一个字符的一部分,而对于足够宽的字符,您可以保留 s[i] 是单个字符的抽象,但它不会使 strings 在各种转换下成为固定长度.

The problem with UTF-8 and UTF-16 is that s[i] is not necessarily a character any more, it might be just a piece of one, whereas with sufficiently wide characters you can preserve the abstraction that s[i] is a single character, tho it does not make strings fixed-length under various transformations.

32 位整数至少足够宽以解决代码点问题,但它们仍然无法处理极端情况,例如,大写某些内容可以改变字符数.

32-bit integers are at least wide enough to solve the code point problem but they still don't handle corner cases, e.g., upcasing something can change the number of characters.

所以结果证明x[i]问题即使通过char32_t也没有完全解决,而那些其他编码的文件格式很差.

So it turns out that the x[i] problem is not completely solved even by char32_t, and those other encodings make poor file formats.

那么,您暗示的观点是非常有效的:wchar_t 是一个失败,部分原因是 Windows 仅将其设置为 16 位,部分原因是它没有解决所有问题,并且与字节流抽象.

Your implied point, then, is quite valid: wchar_t is a failure, partly because Windows made it only 16 bits, and partly because it didn't solve every problem and was horribly incompatible with the byte stream abstraction.

相关文章