Windows 控制台上的 UTF-8 输出

以下代码显示了我机器上的意外行为(在 Windows XP 上使用 Visual C++ 2008 SP1 和在 Windows 7 上使用 VS 2012 测试):

The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):

#include <iostream>
#include "Windows.h"

int main() {
    SetConsoleOutputCP( CP_UTF8 );
    std::cout << "xc3xbc";
    int fail = std::cout.fail() ? '1': '0';
    fputc( fail, stdout );
    fputs( "xc3xbc", stdout );
}

我只是用 cl/EHsc test.cpp 编译.

Windows XP: 控制台窗口中的输出是??0??(翻译成Codepage 1252,最初显示一些线图默认代码页中的字符,可能是 437).当我更改设置时在控制台窗口中使用Lucida Console"字符集并运行我的再次test.exe,输出改为,表示

Windows XP: Output in a console window is ??0?? (translated to Codepage 1252, originally shows some line drawing charachters in the default Codepage, perhaps 437). When I change the settings of the console window to use the "Lucida Console" character set and run my test.exe again, output is changed to , which means

  • 字符ü可以使用fputs及其UTF-8编码C3 BC
  • std::cout 不管什么原因都不起作用
  • failbit在尝试写入字符后设置
  • the character ü can be written using fputs and its UTF-8 encoding C3 BC
  • std::cout does not work for whatever reason
  • the streams failbit is setting after trying to write the character

Windows 7: 使用 Consolas 的输出是 .更有趣.可能写入了正确的字节(至少在将输出重定向到文件时)并且流状态正常,但两个字节作为单独的字符写入).

Windows 7: Output using Consolas is ??0ü. Even more interesting. The correct bytes are written, probably (at least when redirecting the output to a file) and the stream state is ok, but the two bytes are written as separate characters).

我试图在Microsoft Connect"上提出这个问题(参见 这里),但 MS 并没有很有帮助.你不妨看看这里因为之前有人问过类似的问题.

I tried to raise this issue on "Microsoft Connect" (see here), but MS has not been very helpful. You might as well look here as something similar has been asked before.

你能重现这个问题吗?

我做错了什么?std::coutfputs 不应该是一样的吗?效果?

What am I doing wrong? Shouldn't the std::cout and the fputs have the same effect?

解决:(有点)按照 mike.dld 的想法,我实现了一个 std::stringbuf 中执行从 UTF-8 到 Windows-1252 的转换sync() 并用这个转换器替换了 std::cout 的流缓冲(见我对 mike.dld 回答的评论).

SOLVED: (sort of) Following mike.dld's idea I implemented a std::stringbuf doing the conversion from UTF-8 to Windows-1252 in sync() and replaced the streambuf of std::cout with this converter (see my comment on mike.dld's answer).

推荐答案

现在是时候关闭它了.Stephan T. Lavavej 说这种行为是设计使然",尽管我无法理解这个解释.

It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.

我目前的知识是:UTF-8 代码页中的 Windows XP 控制台不适用于 C++ iostreams.

My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.

Windows XP 现在已经过时了,VS 2008 也是如此.我很想知道这个问题在较新的 Windows 系统上是否仍然存在.

Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.

在 Windows 7 上 效果可能是由于 C++ 流输出字符的方式.正如对 在 Windows 控制台中正确打印 utf8 字符 的回答中所见,打印一个字节时,UTF-8 输出失败并带有 C stdio一个接一个像 putc('xc3');putc('xbc'); 也是如此.也许这就是 C++ 流在这里所做的.

On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like putc('xc3'); putc('xbc'); as well. Perhaps this is what C++ streams do here.

相关文章