如何使用 STL 字符串和流读取/存储 unicode

2022-01-07 00:00:00 unicode string stream c++ stl

我需要修改我的程序以接受 Unicode,它可能来自任何 UTF-8 以及各种 UTF-16 和 UTF-32 编码.我对 Unicode 不太了解(虽然我读过 Joel Spolsky 的 文章 和维基百科页面).

I need to modify my program to accept Unicode, which may come from any of UTF-8 and the various UTF-16 and UTF-32 encodings. I don't really know much about Unicode (though I've read Joel Spolsky's article and the Wikipedia page).

现在我正在使用 std::istream 并通过 char 读取我的输入 char,然后存储(必要时)在 std::string 中.我愿意

Right now I'm using an std::istream and reading my input char by char, and then storing (when necessary) in an std::string. I'd like to

  • 修改它(以最少的努力)以支持上述编码,并且
  • 弄清楚如何测试上述编码(我有点像美国人,甚至不知道如何用另一种编码制作示例文本文件),理想情况下
  • 以跨平台的方式执行此操作.

另外,如果可能的话,我想尽可能地节省空间(所以如果我们不需要超过一个字节/字符,我们就不会使用它).据我了解,这意味着存储在 UTF-8 中,这很好,但我不知道这样做的标准字符串(据我了解,wchar_t 具有实现定义的大小和编码).

Also, if possible, I'd like to conserve space as much as possible (so if we don't need more than a byte/character, we don't use it). From what I understand, this means storing in UTF-8, which is fine, but I don't know of a standard string that does this (from what I understand, wchar_t has implementation-defined size and encoding).

推荐答案

查看 嵌入式应用程序从 std::string 切换到 std::wstring? 问题

正如 Pukku 所说:您可能会因为 C++ 标准规定在写入文件时需要宽流将双字节字符转换为单字节字符,而这种转换的完成方式取决于实现.

As Pukku said: You might get some headache because of the fact that the C++ standard dictates that wide-streams are required to convert double-byte characters to single-byte when writing to a file, and how this conversion is done is implementation-dependent.

相关文章