以二进制模式将 utf16 写入文件

2021-12-26 00:00:00 unicode c++ utf-16

我正在尝试以二进制模式使用 ofstream 将 wstring 写入文件,但我认为我做错了什么.这是我试过的:

I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried:

ofstream outFile("test.txt", std::ios::out | std::ios::binary);
wstring hello = L"hello";
outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t));
outFile.close();

在例如 Firefox 中打开 test.txt,编码设置为 UTF16,它将显示为:

Opening test.txt in for example Firefox with encoding set to UTF16 it will show as:

嘿嘿嘿

谁能告诉我为什么会这样?

Could anyone tell me why this happens?

在十六进制编辑器中打开文件我得到:

Opening the file in a hex editor I get:

FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00 

看起来由于某种原因,我在每个字符之间多出了两个字节?

Looks like I get two extra bytes in between every character for some reason?

推荐答案

我怀疑在您的环境中 sizeof(wchar_t) 是 4 - 即它写出的是 UTF-32/UCS-4 而不是 UTF-16.这当然是十六进制转储的样子.

I suspect that sizeof(wchar_t) is 4 in your environment - i.e. it's writing out UTF-32/UCS-4 instead of UTF-16. That's certainly what the hex dump looks like.

这很容易测试(只需打印 sizeof(wchar_t)),但我很确定这是怎么回事.

That's easy enough to test (just print out sizeof(wchar_t)) but I'm pretty sure it's what's going on.

要从 UTF-32 wstring 转换为 UTF-16,您需要应用正确的编码,因为代理对开始发挥作用.

To go from a UTF-32 wstring to UTF-16 you'll need to apply a proper encoding, as surrogate pairs come into play.

相关文章