如何使用 unicode 文件名打开 std::fstream(ofstream 或 ifstream)?

2021-12-05 00:00:00 unicode windows c++

您不会想像使用 C++ 标准库为 Windows 应用程序打开文件这样基本的事情是棘手的......但它似乎是.这里的 Unicode 是指 UTF-8,但我可以转换为 UTF-16 或其他格式,重点是从 Unicode 文件名中获取一个 ofstream 实例.在我修改自己的解决方案之前,这里有首选路线吗?尤其是跨平台的?

You wouldn't imagine something as basic as opening a file using the C++ standard library for a Windows application was tricky ... but it appears to be. By Unicode here I mean UTF-8, but I can convert to UTF-16 or whatever, the point is getting an ofstream instance from a Unicode filename. Before I hack up my own solution, is there a preferred route here ? Especially a cross-platform one ?

推荐答案

C++ 标准库不支持 Unicode.charwchar_t 不需要是 Unicode 编码.

The C++ standard library is not Unicode-aware. char and wchar_t are not required to be Unicode encodings.

在 Windows 上,wchar_t 是 UTF-16,但标准库中没有直接支持 UTF-8 文件名(char 数据类型在 Windows 上不是 Unicode)

On Windows, wchar_t is UTF-16, but there's no direct support for UTF-8 filenames in the standard library (the char datatype is not Unicode on Windows)

使用 MSVC(以及 Microsoft STL),提供了一个文件流构造函数,它采用 const wchar_t* 文件名,允许您将流创建为:

With MSVC (and thus the Microsoft STL), a constructor for filestreams is provided which takes a const wchar_t* filename, allowing you to create the stream as:

wchar_t const name[] = L"filename.txt";
std::fstream file(name);

但是,C++11 标准并未指定此重载(它仅保证基于 char 的版本的存在).从版本 g++ 4.8.x 开始,它也没有出现在替代 STL 实现中,例如 GCC 的 libstdc++ for MinGW(-w64).

However, this overload is not specified by the C++11 standard (it only guarantees the presence of the char based version). It is also not present on alternative STL implementations like GCC's libstdc++ for MinGW(-w64), as of version g++ 4.8.x.

请注意,就像 Windows 上的 char 不是 UTF8,在其他操作系统上 wchar_t 可能不是 UTF16.所以总的来说,这不太可能是便携的.根据标准未定义给定 wchar_t 文件名的流,并且在 chars 中指定文件名可能很困难,因为 char 使用的编码因操作系统而异

Note that just like char on Windows is not UTF8, on other OS'es wchar_t may not be UTF16. So overall, this isn't likely to be portable. Opening a stream given a wchar_t filename isn't defined according to the standard, and specifying the filename in chars may be difficult because the encoding used by char varies between OS'es.

相关文章