std::mt19937 需要预热吗?

我读到许多伪随机数生成器需要许多样本才能预热".使用 std::random_device 播种 std::mt19937 时是这种情况,还是我们可以期望它在构建后准备就绪?有问题的代码:

I've read that many pseudo-random number generators require many samples in ordered to be "warmed up". Is that the case when using std::random_device to seed std::mt19937, or can we expect that it's ready after construction? The code in question:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());

推荐答案

Mersenne Twister 是一种基于移位寄存器的 pRNG(伪随机数生成器),因此会受到长期运行 0 或 1 的坏种子的影响,导致相对可预测的结果,直到内部状态足够混合为止.

Mersenne Twister is a shift-register based pRNG (pseudo-random number generator) and is therefore subject to bad seeds with long runs of 0s or 1s that lead to relatively predictable results until the internal state is mixed up enough.

然而,采用单个值的构造函数在该种子值上使用了一个复杂的函数,该函数旨在最大限度地减少产生这种坏"状态的可能性.还有第二种初始化 mt19937 的方法,您可以通过符合 SeedSequence 概念的对象直接设置内部状态.这是第二种初始化方法,您可能需要关注选择良好"状态或进行热身.

However the constructor which takes a single value uses a complicated function on that seed value which is designed to minimize the likelihood of producing such 'bad' states. There's a second way to initialize mt19937 where you directly set the internal state, via an object conforming to the SeedSequence concept. It's this second method of initialization where you may need to be concerned about choosing a 'good' state or doing warmup.

该标准包含一个符合 SeedSequence 概念的对象,称为 seed_seq.seed_seq 获取任意数量的输入种子值,然后对这些值执行某些操作,以生成适合直接设置 pRNG 内部状态的不同值序列.

The standard includes an object conforming to the SeedSequence concept, called seed_seq. seed_seq takes an arbitrary number of input seed values, and then performs certain operations on these values in order to produce a sequence of different values suitable for directly setting the internal state of a pRNG.

以下是加载具有足够随机数据的种子序列以填充整个 std::mt19937 状态的示例:

Here's an example of loading up a seed sequence with enough random data to fill the entire std::mt19937 state:

std::array<int, 624> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));

std::mt19937 eng(seq);

这确保了整个状态是随机的.此外,每个引擎都指定了它从 seed_sequence 读取的数据量,因此您可能需要阅读文档以找到您使用的任何引擎的信息.

This ensures that the entire state is randomized. Also, each engine specifies how much data it reads from the seed_sequence so you may want to read the docs to find that info for whatever engine you use.

虽然在这里我完全从 std::random_device 加载了 seed_seq,但指定了 seed_seq 以便只有几个不是特别随机的数字应该可以很好地工作.例如:

Although here I load up the seed_seq entirely from std::random_device, seed_seq is specified such that just a few numbers that aren't particularly random should work well. For example:

std::seed_seq seq{1, 2, 3, 4, 5};
std::mt19937 eng(seq);

在下面的评论中,Cubbi 表示 seed_seq 通过为您执行预热序列来工作.

In the comments below Cubbi indicates that seed_seq works by performing a warmup sequence for you.

以下是您的默认"播种:

Here's what should be your 'default' for seeding:

std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 rng(seed);

相关文章