boost::locale 和 std::locale 之间的权衡是什么?
我正在国际化 C++ 中的大型遗留代码库,我面临一个艰难的决定:我应该使用 boost::locale 还是 std c++ 语言环境?
I am in the process of internationalizing a large legacy codebase in C++, and I am faced with a difficult decision: should I use boost::locale's or std c++ locales?
我致力于使用 utf-8.我们必须进行相当广泛的文本处理,虽然它不是我们代码的核心,但它很重要.我们可以期望做大部分可能需要做的事情:时间、日期、数字和货币格式、排序规则、正则表达式、子字符串隔离、与 boost::filesystem 的交互、数据库访问等.
I am commited to using utf-8. We have to do a reasonably broad range of text processing, although it is not the core of what our code does, it is important. We can expect to do most of what one might need to do: time, date, number, and money formatting, collation, regexp, substring isolation, interaction with boost::filesystem, DB access, etc.
boost::locale 简介 我明白了
- 设置全局语言环境有副作用(csv 示例).它影响 printf 和 boolst lexical_cast.一些第三方库可能会损坏.
- 某些语言环境的数字格式不正确.
- 语言环境名称未标准化.
- 许多供应商只提供 C 和 POSIX,因此 GCC 仅支持 Linux 下的本地化.
我无法评估第 1 点的影响,我猜第 2 点如果对我们有影响的话是相当严重的,广告 3 和 4 对我们来说没什么大不了的.
I have trouble evaluating the impact of point 1 I guess point 2 is pretty severe if it affects us, ad 3 and 4 won't be a big deal for us.
社区是否一致认为 Boost::locale 是更好的选择?标准委员会中是否有任何动议来解决 std::locale 的问题?谁能帮助我做出更明智的决定?
Is there a consensus in the community that Boost::locale is the better alternative? Is there any motion in the standard commity to address the issues with std::locale's? Can anyone help me make a more informed decision?
也许最重要的是,从一个迁移到另一个是否简单?两人的配合如何?使用 boost 语言环境设置全局语言环境,然后使用 std 设施是否合法?
Perhaps most importantly, is it simple to migrate from one to the other? How well do the two play with one another? Is it legitimate to set the global locale with a boost locale, and then use std facilities?
推荐答案
最后,boost文档很好地回答了我的问题,但是你必须做一些阅读,它有助于理解std::locale
比我发帖时更好.
In the end, the boost documentation does a good job of answering my question, but you have to do some reading, and it helps to understand std::locale
better than I did at the time of posting.
与标准配合得很好
std::locale
是 facet
的集合.该标准定义了每个语言环境必须提供的一组方面,但除此之外,似乎大部分都留给了实现.这包括语言环境行为和语言环境的名称.
A std::locale
is a collection of facet
s. The standard defines a set of facets which each locale must provide, but other than that it seems most is left to the implementation. This includes locale behavior, and the names of the locales.
boost::locale 所做的是提供一堆方面,收集到语言环境中,无论平台如何,它们的行为方式都相同(至少如果您使用默认的 ICU 后端).
What boost::locale does is provide a bunch of facets, collected into locales, that behave the same way regardless of platform (at least if you are using the default ICU backend).
因此,boost::locale
提供了一组标准化的 std::locale,它们可以跨平台一致地运行,为各种文化规范提供完整的 Unicode 支持,并具有一致的命名.在使用非 boost std::locale
(即实现提供的语言环境)和 boost::locale
之间切换是微不足道的,因为它们是相同的类型――两者都是std::facets
的集合,尽管实现不同.boost::locale
可能会更好地满足您的需求.
So boost::locale
provides a standardized set of std::locale's which can behave consistently across platforms, provides full Unicode support for a wide range of cultural norms, and with consistent naming. Switching between use of a non boost std::locale
(i.e. an implementation provided locale) and a boost::locale
is trivial since they are the same types -- both are collections of std::facets
, although implementations are different. Chances are the boost::locale
s do a better job of doing what you want.
完整的 Unicode 支持,适用于所有平台上的所有编码
此外,boost::locale
提供了一种通过 ICU,它可以让您获得 ICU 的好处,而无需 ICU 的糟糕(不是 C++ish)界面.
Complete Unicode support, for all encodings, on all platforms
Further, boost::locale
provides a way of accessing complete unicode support through ICU, which allows you to gain the benefits of ICU, without the poor (not C++ish) interface of ICU.
这是有利的,因为对 Unicode 的任何标准支持很可能通过语言环境框架来实现,并且任何支持 Unicode 的程序也可能需要支持语言环境(例如排序规则).
This is advantageous, since any standard support of Unicode is very likely to come through the locale frameork, and any unicode aware program is likely going to need to locale aware as well (for collation for example).
关于数字的更理智的行为最后,boost::locale
解决了在 std::locales 的通常实现中可以合法称为重大缺陷的问题――任何流格式的数字都将受到语言环境的影响,无论这是否是可取的-- 参见 boost 文档详细讨论.
Saner behavior regarding numbers
Finally, boost::locale
addresses what could legitimately be called a significant flaw in the usual implementations of the std::locales -- any stream formatted number will be affected by locale, regardless of whether this is desirable -- see the boost documentation for a detailed discussion.
因此,如果您使用 ofstream 来读取或写入文件,并且您已将全局设置 locale
设置为您平台的德语语言环境,那么您将使用逗号分隔浮点数的小数部分.如果您正在读取/写入 csv 文件,那可能是个问题.如果您使用 boost::locale
作为全局语言环境,则只有在明确告诉它为数字输入/输出使用语言环境约定时才会发生这种情况.请注意,许多库在后台使用语言环境信息,包括 boost::lexical_cast.就此而言,std::to_string 也是如此.所以考虑下面的例子:
So if you are using an ofstream to read or write a file, and you have set the globale locale
to your platform's german locale, you'll have commas separating the decimal part of your floats. If you're reading/writing a csv file, that might be a problem. If you used a boost::locale
as your global locale, this will only happen if you explicitly tell it to use locale conventions for your numeric input/output. Note that many libraries use locale info in the background, including boost::lexical_cast. So does std::to_string, for that matter. So consider the following example:
std::locale::global(std::locale("de_DE"));
auto demo = [](const std::string& label)
{
std::cout.imbue(std::locale()); // imbue cout with the global locale.
float f = 1234.567890;
std::cout << label << "
";
std::cout << " streamed: " << f << "
";
std::cout << " to_string: " << std::to_string(f) << "
";
};
std::locale::global(std::locale("C"));//default.
demo("c locale");
std::locale::global(std::locale("de_DE"));//default.
demo("std de locale");
boost::locale::generator gen;
std::locale::global(gen("de_DE.UTF-8"));
demo("boost de locale");
给出以下输出:
c locale
streamed: 1234.57
to_string: 1234.567871
std de locale
streamed: 1.234,57
to_string: 1234,567871
boost de locale
streamed: 1234.57
to_string: 1234,567871
在实现人类通信(输出到 gui 或终端)和机器间通信(csv 文件、xml 等)的代码中,这可能是不受欢迎的行为.使用提升语言环境时,您明确指定何时需要语言环境格式,ala:
In code that implements both human communication (output to gui or terminal) and inter-machine communication (csv files, xml, etc) this is likely undesireable behavior. When using a boost locale, you explicitly specify when you want locale formatting, ala:
cout << boost::locale::as::currency << 123.45 << "
";
cout << boost::locale::as::number << 12345.666 << "
"
结论
似乎 boost::locale 应该优于系统提供的语言环境.
It would seem that boost::locale's should be preferred over the system provided locales.
相关文章