如何在源文件中嵌入 unicode 字符串常量?

2022-01-23 00:00:00 unicode string unit-testing constants c++

我正在编写一些单元测试,以验证我们对使用除普通拉丁字母之外的其他字符集的各种资源的处理:西里尔文、希伯来文等.

I'm writing some unit tests which are going to verify our handling of various resources that use other character sets apart from the normal latin alphabet: Cyrilic, Hebrew etc.

我遇到的问题是我找不到将期望嵌入到测试源文件中的方法:这是我正在尝试做的一个示例...

The problem I have is that I cannot find a way to embed the expectations in the test source file: here's an example of what I'm trying to do...

///
/// Protected: TestGetHebrewConfigString
///  
void CPrIniFileReaderTest::TestGetHebrewConfigString()
{
    prwstring strHebrewTestFilePath = GetTestFilePath( strHebrewTestFileName );
    CPrIniFileReader prIniListReader( strHebrewTestFilePath.c_str() );
    prIniListReader.SetCurrentSection( strHebrewSubSection );   

    CPPUNIT_ASSERT( prIniListReader.GetConfigString( L"?????????" ) == L"????????") );
}

这根本行不通.以前我使用一个宏来解决这个问题,该宏调用一个将窄字符串转换为宽字符串的例程(我们在应用程序中到处都使用拖字符串,所以它是现有代码)

This quite simply doesnt work. Previously I worked around this using a macro which calls a routine to transform a narrow string to a wide string (we use towstring all over the place in our applications so it's existing code)

#define UNICODE_CONSTANT( CONSTANT ) towstring( CONSTANT )

wstring towstring( LPCSTR lpszValue )
{
    wostringstream os;
    os << lpszValue;
    return os.str();
}

上面测试中的断言就变成了:

The assertion in the test above then became:

CPPUNIT_ASSERT( prIniListReader.GetConfigString( UNICODE_CONSTANT( "?????????" ) ) == UNICODE_CONSTANT( "????????" ) );

这在 OS X 上运行良好,但现在我正在移植到 linux,但我发现测试都失败了:这一切都让人觉得很hackish.谁能告诉我他们是否有更好的解决方案来解决这个问题?

This worked OK on OS X but now I'm porting to linux and I'm finding that the tests are all failing: it all feels rather hackish as well. Can anyone tell me if they have a nicer solution to this problem?

推荐答案

一种乏味但可移植的方法是使用数字转义码构建字符串.例如:

A tedious but portable way is to build your strings using numeric escape codes. For example:

wchar_t *string = L"?????????";

变成:

wchar_t *string = "x05d3x05d5x05e0x05d3x05d0x05e8x05dfx05dex05e2";

您必须将所有 Unicode 字符转换为数字转义符.这样你的源代码就变得独立于编码了.

You have to convert all your Unicode characters to numeric escapes. That way your source code becomes encoding-independent.

您可以使用在线工具进行转换,例如这个.它输出 JavaScript 转义格式 uXXXX,所以只需搜索 &将 u 替换为 x 以获得 C 格式.

You can use online tools for conversion, such as this one. It outputs the JavaScript escape format uXXXX, so just search & replace u with x to get the C format.

相关文章