以二进制模式将 utf16 写入文件
我正在尝试以二进制模式使用 ofstream 将 wstring 写入文件,但我认为我做错了什么.这是我试过的:
I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried:
ofstream outFile("test.txt", std::ios::out | std::ios::binary);
wstring hello = L"hello";
outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t));
outFile.close();
在例如 Firefox 中打开 test.txt,编码设置为 UTF16,它将显示为:
Opening test.txt in for example Firefox with encoding set to UTF16 it will show as:
嘿嘿嘿
谁能告诉我为什么会这样?
Could anyone tell me why this happens?
在十六进制编辑器中打开文件我得到:
Opening the file in a hex editor I get:
FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00
看起来由于某种原因,我在每个字符之间多出了两个字节?
Looks like I get two extra bytes in between every character for some reason?
推荐答案
我怀疑在您的环境中 sizeof(wchar_t) 是 4 - 即它写出的是 UTF-32/UCS-4 而不是 UTF-16.这当然是十六进制转储的样子.
I suspect that sizeof(wchar_t) is 4 in your environment - i.e. it's writing out UTF-32/UCS-4 instead of UTF-16. That's certainly what the hex dump looks like.
这很容易测试(只需打印 sizeof(wchar_t)),但我很确定这是怎么回事.
That's easy enough to test (just print out sizeof(wchar_t)) but I'm pretty sure it's what's going on.
要从 UTF-32 wstring 转换为 UTF-16,您需要应用正确的编码,因为代理对开始发挥作用.
To go from a UTF-32 wstring to UTF-16 you'll need to apply a proper encoding, as surrogate pairs come into play.
相关文章