utfcpp 和 Win32 宽 API
使用微小的 utfcpp 库来转换我从中获得的所有内容是否好/安全/可能使用 utf16to8 将广泛的 Windows API(FindFirstFileW 等)转换为有效的 UTF8 表示?
Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?
我想在内部使用 UTF8,但无法获得正确的输出(在另一次转换后通过 wcout 或普通 cout).正常的 ASCII 字符当然可以工作,但 ?? 会搞砸.
I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ?? gets messed up.
或者有更简单的选择吗?
Or is there an easier alternative?
谢谢!
更新:感谢 Hans(下文),我现在可以通过 Windows API 轻松进行 UTF8<->UTF16 转换.两种方式转换有效,但是来自 UTF16 字符串的 UTF8 有一些额外的字符,可能会在以后给我带来一些麻烦......).出于纯粹的友好,我会在这里分享它:) ):
UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):
// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
// get length
int length = WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0,
NULL, NULL );
if( !(length > 0) )
return std::string();
else
{
std::string result;
result.resize( length );
if( WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size(),
NULL, NULL ) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
}
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
// get length
int length = MultiByteToWideChar( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0 );
if( !(length > 0) )
return std::wstring();
else
{
std::wstring result;
result.resize( length );
if( MultiByteToWideChar(CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size()) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
}
}
推荐答案
Win32 API 已经有一个函数可以做到这一点,WideCharToMultiByte() with CodePage = CP_UTF8.使您不必依赖另一个库.
The Win32 API already has a function to do this, WideCharToMultiByte() with CodePage = CP_UTF8. Saves you from having to rely on another library.
通常不能将结果与 wcout 一起使用.它的输出进入控制台,出于遗留原因,它使用 8 位 OEM 编码.您可以使用 SetConsoleCP() 更改代码页,65001 是 UTF-8 (CP_UTF8) 的代码页.
You cannot normally use the result with wcout. Its output goes to the console, it uses an 8-bit OEM encoding for legacy reasons. You can change the code page with SetConsoleCP(), 65001 is the code page for UTF-8 (CP_UTF8).
您的下一个绊脚石将是用于控制台的字体.您必须更改它,但要找到一种固定间距且具有完整字形来覆盖 Unicode 的字体将很困难.当您在输出中获得方形矩形时,您会发现字体有问题.问号是编码问题.
Your next stumbling block would be the font that's used for the console. You'll have to change it but finding a font that's fixed-pitch and has a full set of glyphs to cover Unicode is going to be difficult. You'll see you have a font problem when you get square rectangles in the output. Question marks are encoding problems.
相关文章