将 wstring 转换为以 UTF-8 编码的字符串

2021-12-28 00:00:00 string utf-8 c++ wstring

我需要在 wstring 和 string 之间进行转换.我发现,使用 codecvt facet 应该可以解决问题,但它似乎不适用于 utf-8 语言环境.

I need to convert between wstring and string. I figured out, that using codecvt facet should do the trick, but it doesn't seem to work for utf-8 locale.

我的想法是,当我将 utf-8 编码文件读取为字符时,一个 utf-8 字符被读取为两个普通字符(这就是 utf-8 的工作原理).我想从我在代码中使用的库的 wstring 表示创建这个 utf-8 字符串.

My idea is, that when I read utf-8 encoded file to chars, one utf-8 character is read into two normal characters (which is how utf-8 works). I'd like to create this utf-8 string from wstring representation for library I use in my code.

有人知道怎么做吗?

我已经试过了:

  locale mylocale("cs_CZ.utf-8");
  mbstate_t mystate;

  wstring mywstring = L"???yáí";

  const codecvt<wchar_t,char,mbstate_t>& myfacet =
    use_facet<codecvt<wchar_t,char,mbstate_t> >(mylocale);

  codecvt<wchar_t,char,mbstate_t>::result myresult;  

  size_t length = mywstring.length();
  char* pstr= new char [length+1];

  const wchar_t* pwc;
  char* pc;

  // translate characters:
  myresult = myfacet.out (mystate,
      mywstring.c_str(), mywstring.c_str()+length+1, pwc,
      pstr, pstr+length+1, pc);

  if ( myresult == codecvt<wchar_t,char,mbstate_t>::ok )
   cout << "Translation successful: " << pstr << endl;
  else cout << "failed" << endl;
  return 0;

对于 cs_CZ.utf-8 语言环境返回失败"并且对于 cs_CZ.iso8859-2 语言环境正常工作.

which returns 'failed' for cs_CZ.utf-8 locale and works correctly for cs_CZ.iso8859-2 locale.

推荐答案

C++ 不知道 Unicode.使用外部库,例如 ICU(UnicodeString 类) 或 Qt (QString class),都支持Unicode,包括 UTF-8.

C++ has no idea of Unicode. Use an external library such as ICU (UnicodeString class) or Qt (QString class), both support Unicode, including UTF-8.

相关文章