Shift-JIS 解码在 Visual C++ 2013 中使用 wifstrem 失败
我正在尝试使用 std::wifstream 和 std::getline 读取以 Shift-JIS (cp 932) 编码的文本文件.以下代码在 VS2010 中有效,但在 VS2013 中失败:
I am trying to read a text file encoded in Shift-JIS (cp 932) using std::wifstream, and std::getline. The following code works in VS2010 but fails in VS2013:
std::wifstream in;
in.open("data932.txt");
const std::locale locale(".932");
in.imbue(locale);
std::wstring line1, line2;
std::getline(in, line1);
std::getline(in, line2);
const bool good = in.good();
该文件包含多行,其中第一行仅包含 ASCII 字符,第二行是日语脚本.因此,当此代码段运行时,line1
应包含 ASCII 行,line2
日文脚本,good
应为 true.
The file contains several lines, where the first line contains just ASCII characters, and the second is Japanese script. Thus, when this snippet runs, line1
should contain the ASCII line, line2
the Japanese script, and good
should be true.
在VS2010中编译时,结果如预期.但是在VS2013编译时,line1
包含ASCII行,但line2
为空,good
为false.
When compiled in VS2010, the result is as expected. But when compiled in VS2013, line1
contains the ASCII line, but line2
is empty, and good
is false.
我调试到 CRT 中(因为 Visual Studio 提供了源),发现在两个版本之间修改了一个名为 _Mbrtowc
(在文件 xmbtowc.c 中)的内部函数,并且改变了检测双字节字符前导字节的方式,VS 2013中检测前导字节失败,从而无法解码字节流.
I debugged into the CRT, (as the source is provided with Visual Studio), and found that an internal function called _Mbrtowc
(in file xmbtowc.c) was modified between the two versions, and the way they use to detect a lead byte of a double byte character was changed, and the one in VS 2013 fails to detect a lead byte, thus fails to decode the byte stream.
进一步的调试揭示了一个点,其中一个 _Cvtvec
对象的 _Isleadbyte
数组被初始化(在函数 _Getcvt()
中,在文件 xwctomb.c),并且初始化会产生错误的结果.它似乎总是使用代码页 1252,这是我系统上的默认代码页,而不是为正在使用的流设置的 932.但是,我无法确定它是否是设计使然,并且我错过了一些获得良好结果所需的步骤,或者这确实是 VS2013 的 CRT 中的错误.
Further debugging revealed a point, where a _Cvtvec
object's _Isleadbyte
array is initialized (in the function _Getcvt()
, in file xwctomb.c), and that initialization produces a wrong result. It seems that it always uses code page 1252, which is the default code page on my system, and not 932 which is set for the stream in use. However, I could not decide if it is by design, and I missed some required steps to get a good result, or this is indeed a bug in the CRT for VS2013.
很遗憾我没有安装 VS2012,所以我无法在那个版本上进行测试.
Unfortunately I don't have VS2012 installed, so I could not test on that version.
欢迎对此主题有任何见解!
Any insights on this topic are welcome!
推荐答案
我找到了一个解决方法:如果为了创建语言环境我显式更改了全局 MBC 代码页,语言环境被正确初始化,并且行被读取并按预期解码.
I have found a workaround: if for the creation of the locale I explicitly change the global MBC code page, the locale is initialized correctly, and the lines are read and decoded as expected.
const int oldMbcp = _getmbcp();
_setmbcp(932);
const std::locale locale("Japanese_Japan.932");
_setmbcp(oldMbcp);
相关文章