MSVC++中源字符集编码的规范，如gcc“-finput-charset=CharSet"；

2021-12-22 00:00:00 unicode character-encoding command-line-arguments visual-c++ c++

我想创建一些处理编码的示例程序，特别是我想使用宽字符串，例如:

I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:

wstring a=L"grü?en"; wstring b=L"???? ????!"; wstring c=L"中文";

因为这些是示例程序.

对于将源代码视为 UTF-8 编码文本的 gcc，这绝对是微不足道的.但是，直接编译在 MSVC 下不起作用.我知道我可以使用转义序列对它们进行编码，但我更愿意将它们保留为可读文本.

This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But, straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.

是否有任何选项可以指定为cl"的命令行开关，以便使这项工作?有没有像 gcc'c -finput-charset 这样的命令行开关?

Is there any option that I can specify as command line switch for "cl" in order to make this work? There are there any command line switch like gcc'c -finput-charset?

如果不是，您如何建议使文本对用户自然?

If not how would you suggest make the text natural for user?

注意:将 BOM 添加到 UTF-8 文件不是一种选择，因为它无法被其他编译器编译.

Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.

注意 2: 我需要它在 MSVC 版本中工作 >= 9 == VS 2008

Note2: I need it to work in MSVC Version >= 9 == VS 2008

真正的答案:没有解决办法

推荐答案

对于那些坚持迟到总比不到好"座右铭的人，Visual Studio 2015(编译器的第 19 版)现在支持这一点.

For those who subscribe to the motto "better late than never", Visual Studio 2015 (version 19 of the compiler) now supports this.

新的 /source-charset 命令行开关允许您指定用于解释源文件的字符集编码.它需要一个参数，可以是 IANA 或ISO字符集名称:

The new /source-charset command line switch allows you to specify the character set encoding used to interpret source files. It takes a single parameter, which can be either the IANA or ISO character set name:

/source-charset:utf-8

或特定代码页的十进制标识符(以点开头):

or the decimal identifier of a particular code page (preceded by a dot):

/source-charset:.65001

官方文档在这里，还有Visual C++ 团队博客上描述这些新选项的详细文章.

还有一个补充的/execution-charset开关以完全相同的方式工作，但控制在可执行文件中生成的窄字符和字符串文字.最后还有一个快捷开关，/utf-8，设置 /source-charset:utf-8 和 /execution-charset:utf-8.

There is also a complementary /execution-charset switch that works in exactly the same way but controls how narrow character- and string-literals are generated in the executable. Finally, there is a shortcut switch, /utf-8, that sets both /source-charset:utf-8 and /execution-charset:utf-8.

这些命令行选项与旧的 #pragma setlocale 和 #pragma execution-character-set 指令不兼容，它们适用全局到所有源文件.

These command-line options are incompatible with the old #pragma setlocale and #pragma execution-character-set directives, and they apply globally to all source files.

对于坚持使用旧版本编译器的用户，最好的选择仍然是将源文件保存为带有 BOM 的 UTF-8(正如其他答案所建议的，IDE 可以在保存时执行此操作).编译器将自动检测到这一点并采取适当的行动.GCC 也将如此，它也在源文件的开头接受 BOM 而不会窒息，使这种方法在功能上具有可移植性.

For users stuck on older versions of the compiler, the best option is still to save your source files as UTF-8 with a BOM (as other answers have suggested, the IDE can do this when saving). The compiler will automatically detect this and behave appropriately. So, too, will GCC, which also accepts a BOM at the start of source files without choking to death, making this approach functionally portable.

相关文章