C++ 跨平台应用程序中本地化文本的最佳实践?

2022-01-11 00:00:00 localization c++

在当前的 C++ 标准 (C++03) 中,关于文本本地化的规范太少了,这使得 C++ 开发人员在处理本地化文本时比平时更加??困难(当然,C++0x 标准稍后会有所帮助).

In the current C++ standard (C++03), there are too few specifications about text localization and that makes the C++ developer's life harder than usual when working with localized texts (certainly the C++0x standard will help here later).

  1. 响应式(实时)应用程序:应用程序必须将无响应时间降至不明显",因此执行速度很重要.
  2. 本地化文本:显示的文本已本地化为两种以上的语言,可能更多 - 不要指望固定数量的语言,应该易于扩展.
  3. 运行时定义的语言:文本不应在应用程序中编译(也不应每种语言有一个应用程序),您会在应用程序启动时获得选择的语言信息 - 这意味着某种文本加载.
  4. 跨平台:应用程序的编码考虑到跨平台(Windows - Linux/Ubuntu - Mac/OSX),因此本地化的文本系统也必须是跨平台的.
  5. 独立应用程序:应用程序提供运行所需的一切;它不会使用任何环境库,也不会要求用户安装操作系统以外的任何东西(例如大多数游戏).
  1. responsive (real time) application: the application has to minimize non-responsive times to "not noticeable", so speed of execution is important.
  2. localized texts: displayed texts are localized in more than two languages, potentially more - don't expect a fixed number of languages, should be easily extensible.
  3. language defined at runtime: the texts should not be compiled in the application (nor having one application per language), you get the chosen language information at application launch - which implies some kind of text loading.
  4. cross-platform: the application is be coded with cross-platform in mind (Windows - Linux/Ubuntu - Mac/OSX) so the localized text system have to be cross platform too.
  5. stand-alone application: the application provides all that is necessary to run it; it won't use any environment library or require the user to install anything other than the OS (like most games for example).

在此类应用程序中管理 C++ 本地化文本的最佳实践是什么?

我去年对此进行了研究,我唯一确定的是您应该使用 std::wstringstd::basic_string<ABigEnoughType>操作应用程序中的文本.我停止了我的研究,因为我更多地研究文本显示"问题(在实时 3D 的情况下),但我想还有一些最佳实践可以在原始 C++ 中管理本地化文本并使用 Unicode".

What are the best practices to manage localized texts in C++ in this kind of application?

I looked into this last year that and the only things I'm sure of are that you should use std::wstring or std::basic_string<ABigEnoughType> to manipulate the texts in the application. I stopped my research because I was working more on the "text display" problem (in the case of real-time 3D), but I guess there are some best practices to manage localized texts in raw C++ beyond just that and "use Unicode".

因此,欢迎所有最佳实践、建议和信息(我认为跨平台很难)!

So, all best-practices, suggestions and information (cross-platform makes it hard I think) are welcome!

推荐答案

在一家名为 Black Lantern Studios 的小型视频游戏公司,我是一款名为 Lionel Trains DS 的游戏的首席开发人员.我们本地化为英语、西班牙语、法语和德语.我们预先知道所有的语言,所以在编译时包括它们是唯一的选择.(它们被刻录到 ROM 中,你看)

At a small Video Game Company, Black Lantern Studios, I was the Lead developer for a game called Lionel Trains DS. We localized into English, Spanish, French, and German. We knew all the languages up front, so including them at compile time was the only option. (They are burned to a ROM, you see)

我可以向您介绍我们所做的一些事情.我们的字符串在启动时根据播放器的语言选择加载到一个数组中.每种单独的语言都进入一个单独的文件,所有字符串都以相同的顺序排列.字符串 1 始终是游戏的标题,字符串 2 始终是第一个菜单选项,依此类推.我们从 enum 中键入数组,因为 integer 索引非常快,而在游戏中,速度就是一切.(在其他答案之一中链接的解决方案使用 string 查找,我倾向于避免.)显示字符串时,我们使用 printf() 类型函数用值替换标记."火车 3 正在离开城市 1."

I can give you information on some of the things we did. Our strings were loaded into an array at startup based on the language selection of the player. Each individual language went into a separate file with all the strings in the same order. String 1 was always the title of the game, string 2 always the first menu option, and so on. We keyed the arrays off of an enum, as integer indexing is very fast, and in games, speed is everything. ( The solution linked in one of the other answers uses string lookups, which I would tend to avoid.) When displaying the strings, we used a printf() type function to replace markers with values. "Train 3 is departing city 1."

现在是一些陷阱.

1) 在语言之间,短语顺序是完全不同的.火车 3 正在离开城市 1." 翻译成德语并返回从城市 1,火车 3 正在离开".如果您使用类似 printf() 的字符串,并且您的字符串是火车 %d 正在离开城市 %d.",那么德国人最终会说From City3, 火车 1 正在出发."这是完全错误的.我们通过强制翻译保留相同的词序解决了这个问题,但我们最终得到了一些非常糟糕的德语.如果我再做一次,我会编写一个函数,将字符串和一个从零开始的值数组放入其中.然后我会使用 %0%1 之类的标记,基本上将数组索引嵌入到字符串中.更新:@Jonathan Leffler 指出符合 POSIX 的 printf() 支持使用 %2$s 类型标记,其中 2$ 部分指示 printf() 用第二个附加参数填充该标记.这将非常方便,只要它足够快.自定义解决方案可能仍然更快,因此您需要确保并测试两者.

1) Between languages, phrase order is completely different. "Train 3 is departing city 1." translated to German and back ends up being "From City 1, Train 3 is departing". If you are using something like printf() and your string is "Train %d is departing city %d." the German will end up saying "From City 3, Train 1 is departing." which is completely wrong. We solved this by forcing the translation to retain the same word order, but we ended up with some pretty broken German. Were I to do it again, I would write a function that takes the string and a zero-based array of the values to put in it. Then I would use markers like %0 and %1, basically embedding the array index into the string. Update: @Jonathan Leffler pointed out that a POSIX-compliant printf() supports using %2$s type markers where the 2$ portion instructs the printf() to fill that marker with the second additional parameter. That would be quite handy, so long as it is fast enough. A custom solution may still be faster, so you'll want to make sure and test both.

2) 语言的长度差异很大.英文的 30 个字符有时会变成德文的 110 个字符.这意味着它通常不适合我们放置的屏幕.对于 PC/Mac 游戏来说,这可能不是一个问题,但是如果您正在做任何文本必须适合定义的框的工作,您将需要考虑这一点.为了解决这个问题,我们从文本中尽可能多地去除了其他语言的形容词.这缩短了句子,但保留了意思,如果失去了一点味道.后来我设计了一个我们可以使用的应用程序,它包含字体和框大小,并允许翻译人员进行自己的修改以使文本适合框.不确定他们是否曾经实施过.如果您遇到此问题,您还可以考虑使用滚动文本区域.

2) Languages vary greatly in length. What was 30 characters in English came out sometimes to as much as 110 characters in German. This meant it often would not fit the screens we were putting it on. This is probably less of a concern for PC/Mac games, but if you are doing any work where the text must fit in a defined box, you will want to consider this. To solve this issue, we stripped as many adjectives from our text as possible for other languages. This shortened the sentence, but preserved the meaning, if loosing a bit of the flavor. I later designed an application that we could use which would contain the font and the box size and allow the translators to make their own modifications to get the text fit into the box. Not sure if they ever implemented it. You might also consider having scrolling areas of text, if you have this problem.

3) 就跨平台而言,我们为本地化系统编写了几乎纯 C++.我们编写了要加载的自定义编码二进制文件,以及将语言文本的 CSV 转换为 .h 的自定义程序,其中包含枚举和文件到语言映射,以及 .lang.我们使用的最特定于平台的东西是字体和 printf() 函数,但是您将拥有适合您开发的任何地方的东西,或者如果需要也可以编写自己的东西.

3) As far as cross platform goes, we wrote pretty much pure C++ for our Localization system. We wrote custom encoded binary files to load, and a custom program to convert from a CSV of language text into a .h with the enum and file to language map, and a .lang for each language. The most platform specific thing we used was the fonts and the printf() function, but you will have something suitable for wherever you are developing, or could write your own if needed.

相关文章