在一个范围内生成无偏随机整数的最佳算法是什么?
在这个 StackOverflow 问题中:
In this StackOverflow question:
从范围生成随机整数
接受的答案建议使用以下公式在给定的 min
和 max
之间生成一个随机整数,其中 min
和 max
被包含在范围内:
the accepted answer suggests the following formula for generating a random integer in between given min
and max
, with min
and max
being included into the range:
output = min + (rand() % (int)(max - min + 1))
但它也说
这仍然略微偏向于较低的数字......它也是可以扩展它以消除偏差.
This is still slightly biased towards lower numbers ... It's also possible to extend it so that it removes the bias.
但它没有解释为什么它偏向于较低的数字或如何消除这种偏向.所以,问题是:这是在(有符号)范围内生成随机整数的最佳方法,而不依赖于任何花哨的东西,只是 rand()
函数,如果它是最优,如何消除偏差?
But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand()
function, and in case if it is optimal, how to remove the bias?
我刚刚针对浮点外推测试了@Joey 建议的 while
循环算法:
I've just tested the while
-loop algorithm suggested by @Joey against floating-point extrapolation:
static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);
查看有多少均匀的球"落"入并分布在多个桶"中,一个测试用于浮点外推,另一个用于 while
循环算法.但结果证明结果会因球"(和桶")的数量而异,因此我无法轻易选出获胜者.可以在 此 Ideone 页面 中找到工作代码.例如,对于 10 个桶和 100 个球,浮点外推的桶之间理想概率的最大偏差小于 while
循环算法(分别为 0.04 和 0.05)但有 1000 个球,while
-loop 算法的最大偏差较小(0.024 和 0.011),并且在 10000 个球的情况下,浮点外推再次做得更好(0.0034 和 0.0053),依此类推.的一致性.考虑到没有一种算法始终产生比其他算法更好的均匀分布的可能性,让我倾向于浮点外推,因为它似乎比 while
循环算法执行得更快.那么选择浮点外推算法好还是我的测试/结论不完全正确?
to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while
-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while
-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while
-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while
-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?
推荐答案
当随机数生成器的输出个数 (RAND_MAX+1) 不能被所需的范围 (max-min+1) 整除时出现问题.由于从随机数到输出会有一致的映射,因此某些输出将映射到比其他输出更多的随机数.这与映射是如何完成的无关――您可以使用模数、除法、转换为浮点数,无论您能想出什么伏都教,基本问题仍然存在.
The problem occurs when the number of outputs from the random number generator (RAND_MAX+1) is not evenly divisible by the desired range (max-min+1). Since there will be a consistent mapping from a random number to an output, some outputs will be mapped to more random numbers than others. This is regardless of how the mapping is done - you can use modulo, division, conversion to floating point, whatever voodoo you can come up with, the basic problem remains.
问题的严重性非常小,要求不高的应用程序通常可以忽略它.范围越小,RAND_MAX越大,效果越不明显.
The magnitude of the problem is very small, and undemanding applications can generally get away with ignoring it. The smaller the range and the larger RAND_MAX is, the less pronounced the effect will be.
我采用了您的示例程序并对其进行了一些调整.首先我创建了一个特殊版本的rand
,范围只有0-255,以更好地展示效果.我对 rangeRandomAlg2
做了一些调整.最后我将球"的数量改为 1000000 以提高一致性.您可以在此处查看结果:http://ideone.com/4P4HY
I took your example program and tweaked it a bit. First I created a special version of rand
that only has a range of 0-255, to better demonstrate the effect. I made a few tweaks to rangeRandomAlg2
. Finally I changed the number of "balls" to 1000000 to improve the consistency. You can see the results here: http://ideone.com/4P4HY
请注意,浮点版本产生两个紧密分组的概率,接近 0.101 或 0.097,介于两者之间.这就是行动中的偏见.
Notice that the floating-point version produces two tightly grouped probabilities, near either 0.101 or 0.097, nothing in between. This is the bias in action.
我认为称其为Java 的算法"有点误导 - 我确信它比 Java 古老得多.
I think calling this "Java's algorithm" is a bit misleading - I'm sure it's much older than Java.
int rangeRandomAlg2 (int min, int max)
{
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do
{
x = rand();
} while (x >= RAND_MAX - remainder);
return min + x % n;
}
相关文章