在 [0..n-1] 范围内生成 m 个不同的随机数

2021-12-21 00:00:00 algorithm performance random c++

我有两种方法可以在 [0..n-1] 范围内生成 m 个不同的随机数

I have two methods of generating m distinct random numbers in the range [0..n-1]

方法一:

//C++-ish pseudocode int result[m]; for(i = 0; i < m; ++i) { int r; do { r = rand()%n; }while(r is found in result array at indices from 0 to i) result[i] = r; }

方法二:

//C++-ish pseudocode int arr[n]; for(int i = 0; i < n; ++i) arr[i] = i; random_shuffle(arr, arr+n); result = first m elements in arr;

当 n 远大于 m 时，第一种方法更有效，否则第二种方法更有效.但是更大"不是一个严格的概念，是吗?:)

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

问题: 我应该使用 n 和 m 的什么公式来确定方法 1 还是方法 2 更有效?(就运行时间的数学期望而言)

Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

推荐答案

纯数学:
让我们计算两种情况下 rand() 函数调用的数量并比较结果:

Pure mathematics:
Let's calculate the quantity of rand() function calls in both cases and compare the results:

案例 1:当您已经选择了 k 个数字时，让我们看看在步骤 i = k 上调用的数学期望.通过一次 rand() 调用获得一个数字的概率等于 p = (n-k)/n.我们需要知道此类调用数量的数学期望，这导致获得我们还没有的数字.

Case 1: let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand() call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

使用1 调用获得它的概率是p.使用 2 调用 - q * p，其中 q = 1 - p.在一般情况下，在 n 次调用之后准确获得它的概率是 (q^(n-1))*p.因此，数学期望为
Sum[ n * q^(n-1) * p ], n = 1 -->INF.这个总和等于 1/p(由 wolfram alpha 证明).

The probability to get it using 1 call is p. Using 2 calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after n calls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p (proved by wolfram alpha).

因此，在步骤 i = k 上，您将执行 1/p = n/(nk) 调用 rand()功能.

So, on the step i = k you will perform 1/p = n/(n-k) calls of the rand() function.

现在让我们总结一下:

Sum[ n/(n - k) ], k = 0 -->m - 1 = n * T - 方法 1 中的 rand 调用次数.
这里 T = Sum[ 1/(n - k) ], k = 0 -->m - 1

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T - the number of rand calls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

情况 2:

这里 rand() 在 random_shuffle 内被调用 n - 1 次(在大多数实现中).

Here rand() is called inside random_shuffle n - 1 times (in most implementations).

现在，要选择方法，我们必须比较这两个值: n * T ?n - 1.
因此，要选择合适的方法，请按上述方法计算 T.如果 T <(n - 1)/n 最好使用第一种方法.否则使用第二种方法.

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate T as described above. If T < (n - 1)/n it's better to use the first method. Use the second method otherwise.

相关文章