stl 随机分布和可移植性

2022-01-07 00:00:00 random cross-platform c++ c++11 stl

为什么标准分布的结果没有被强制要求在实现之间保持一致?伪随机数生成器的结果是另一方面要求是相同的.

Why is it that the result of standard distributions isn't mandated to be consistent across implementations? The result of pseudo random number generators is on the other hand mandated to be identical.

例如,对于每个不同的标准库实现,以下几乎肯定会打印出不同的内容.

For example, the following will almost certainly print something different for every different standard library implementation.

std::mt19937 random {100};
std::normal_distribution<> dist;

std::cout << dist(random);

假设我想做程序生成,并希望相同的起始种子在不同平台和编译器之间产生相同的结果.我不能用 stl 做到这一点.我必须回归"使用boost.为什么这不是缺陷?

Say I want to do procedural generation and would like identical starting seeds to result in identical results across platforms and compilers. I can't do it with the stl. I have to "regress" to using boost. Why isn't this a defect?

推荐答案

这不是缺陷,而是设计使然.这样做的理由可以在 A Proposal to向标准库 (N1398) 添加一个可扩展的随机数工具,上面写着(强调我的):

This is not a defect, it is by design. The rationale for this can be found in A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398) which says (emphasis mine):

另一方面,分布的规范仅定义统计结果,而不是要使用的精确算法.这与引擎不同,因为对于分布算法,他们的正确性的严格证明是可用的,通常在前提条件是输入随机数是(真正)一致的分散式.例如,至少有一些算法已知从均匀产生正态分布的随机数分布式的.其中哪一个最有效取决于最小的各种超越的相对执行速度CPU 的功能、缓存和分支预测行为,以及所需的内存使用.因此,该提议留下了选择算法实现.它遵循输出序列各个实现的分布不会相同.它是预计实现将仔细选择算法预先分配,因为这肯定会让客户感到惊讶如果某个分布产生与一个不同的数字下一个实施版本.

On the other hand, the specifications for the distributions only define the statistical result, not the precise algorithm to use. This is different from engines, because for distribution algorithms, rigorous proofs of their correctness are available, usually under the precondition that the input random numbers are (truely) uniformly distributed. For example, there are at least a handful of algorithms known to produce normally distributed random numbers from uniformly distributed ones. Which one of these is most efficient depends on at least the relative execution speeds for various transcendental functions, cache and branch prediction behaviour of the CPU, and desired memory use. This proposal therefore leaves the choice of the algorithm to the implementation. It follows that output sequences for the distributions will not be identical across implementations. It is expected that implementations will carefully choose the algorithms for distributions up front, since it is certainly surprising to customers if some distribution produces different numbers from one implementation version to the next.

这一点在实现定义部分重申:

This point is reiterated in the implementation defined section which says:

指定了如何产生各种分布的算法作为实现定义,因为有各种各样的每个分布已知的算法.每个都有不同的权衡在速度、对最新计算机架构的适应方面,以及内存使用.实现需要记录它的选择,以便用户可以判断其质量是否可以接受.

The algorithms how to produce the various distributions are specified as implementation-defined, because there is a vast variety of algorithms known for each distribution. Each has a different trade-off in terms of speed, adaptation to recent computer architectures, and memory use. The implementation is required to document its choice so that the user can judge whether it is acceptable quality-wise.

相关文章