C++uniform_int_distribution 在第一次调用时总是返回 min()
在标准库的至少一个实现中,std::uniform_int_distribution<>
的第一次调用不会返回随机值,而是分布的最小值.也就是说,给定代码:
In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<>
does not return a random value, but rather the distribution's min value. That is, given the code:
default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );
对于 any_seed()
、smaller
、smaller
的任何值,
...x
实际上将 smaller
,或更大
.
...x
will in fact be smaller
for any values of any_seed()
, smaller
, or larger
.
要在家玩,您可以尝试在 gcc 4.8.1 中演示此问题的代码示例.
To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.
我相信这是不正确的行为?如果这是正确的行为,为什么随机分布会返回这个明显非随机的值?
I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?
推荐答案
对观察到的行为的解释
如果可能结果的范围小于 rng 产生的数字范围,uniform_int_distribution
就是这样将随机位映射到数字的:
Explanation for the observed behavior
This is how uniform_int_distribution
maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
__ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;
其中 __urange
是 larger -smaller
并且 __urngrange
是 rng 可以返回的最大值和最小值之间的差值.(代码来自 libstdc++ 6.1 中的 bits/uniform_int_dist.h)
where __urange
is larger - smaller
and __urngrange
is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)
在我们的例子中,rng default_random_engine
是一个 minstd_rand0
,它产生 __scaling == 195225785
对于范围 [0,10] 你测试.因此,如果 rng() <195225785
,分配将返回0.
In our case, the rng default_random_engine
is a minstd_rand0
, which yields __scaling == 195225785
for the range [0,10] you tested with. Thus, if rng() < 195225785
, the distribution will return 0.
minstd_rand0
返回的第一个数字是
(16807 * seed) % 2147483647
(其中 seed == 0
被调整为 1
顺便说一句).因此,我们可以看到由 minstd_rand0
产生的第一个值以小于 11615 的数字作为种子将产生 0,uniform_int_distribution<国际 >分布( 0, 10 );
你用过.(修改我的一个错误.;))
(where seed == 0
gets adjusted to 1
btw). We can thus see that the first value produced by a minstd_rand0
seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 );
you used. (mod off-by-one-errors on my part. ;) )
您提到了更大种子的问题会消失:一旦种子变得足够大以实际使 mod 操作执行某些操作,我们就不能简单地通过除法将整个范围的值分配给相同的输出,因此结果将看起来更好.
You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.
没有.通过始终选择较小的随机数,您在应该是随机的 32 位种子中引入了显着的偏差.结果中出现的偏见并不奇怪或邪恶.对于随机种子,即使您的 minstd_rand0
也会产生相当均匀的随机第一个值.(虽然之后的数字序列不会有很好的统计质量.)
No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0
will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)
案例 1:您想要高统计质量的随机数.
Case 1: You want random number of high statistical quality.
为此,您可以使用更好的 rng,例如 mt19937
并为其 整个 状态空间设定种子.对于 Mersenne Twister,这是 624 个 32 位整数.(作为参考,这里是我尝试正确执行此操作的一些有用建议在答案中.)
For that, you use a better rng like mt19937
and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)
案例 2:您真的只想使用那些小种子.
Case 2: You really want to use those small seeds only.
我们仍然可以从中获得不错的结果.问题是伪随机数生成器通常有点连续地"依赖于随机数生成器.在他们的种子上.为了解决这个问题,我们丢弃了足够的数字,让最初相似的输出序列发散.因此,如果您的种子必须很小,您可以像这样初始化您的 rng:
We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:
std::mt19937 rng(smallSeed);
rng.discard(700000);
为此使用像 Mersenne Twister 这样的好 rng 至关重要.我不知道有什么方法可以从种子不佳的 minstd_rand0
中获得合适的值,例如参见 这个火车失事.即使播种正确,mt19937
的统计特性也远胜一筹.
It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0
, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937
are superior by far.
您有时会听到对大型状态空间或缓慢生成的担忧,但在嵌入式世界之外通常并不担心.根据 boost 和 cacert.at,MT 甚至比 minstd_rand0代码>.
Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0
.
尽管如此,您仍然需要执行丢弃技巧,即使您的结果在没有肉眼的情况下看起来不错.在我的系统上它只需要不到一毫秒,而且你不经常播种 rng,所以没有理由不这样做.
You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.
请注意,我无法准确估计我们需要的丢弃次数,我从 中获取了该值这个答案,它链接这篇论文为理性.我现在没有时间解决这个问题.
Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.
相关文章