如何简洁、便携、彻底地播种 mt19937 PRNG?

2021-12-21 00:00:00 random c++ c++11

我似乎看到很多答案中有人建议使用来生成随机数，通常还有这样的代码:

I seem to see many answers in which someone suggests using <random> to generate random numbers, usually along with code like this:

std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<> dis(0, 5); dis(gen);

通常这会取代某种邪恶的可憎"，例如:

Usually this replaces some kind of "unholy abomination" such as:

srand(time(NULL)); rand()%6;

我们可能会批评旧方法，认为time(NULL) 提供低熵，time(NULL) 是可预测的，最终结果是非均匀的.

We might criticize the old way by arguing that time(NULL) provides low entropy, time(NULL) is predictable, and the end result is non-uniform.

但所有这些都适用于新方式:它只是具有更闪亮的饰面.

But all of that is true of the new way: it just has a shinier veneer.

rd() 返回单个 unsigned int.这至少有 16 位，可能有 32 位.这不足以为 MT 的 19937 位状态播种.

rd() returns a single unsigned int. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.

使用 std::mt19937 gen(rd());gen()(用 32 位播种并查看第一个输出)不能提供良好的输出分布.7 和 13 永远不能是第一个输出.两颗种子产生 0.十二颗种子产生 1226181350.(链接)

Using std::mt19937 gen(rd());gen() (seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)

std::random_device 可以，有时是，作为一个带有固定种子的简单 PRNG 来实现.因此，它可能会在每次运行时产生相同的序列.(链接) 这比time(NULL)还要糟糕.

std::random_device can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse than time(NULL).

更糟糕的是，复制和粘贴上述代码片段非常容易，尽管它们存在问题.对此的一些解决方案需要获取较大库，可能并不适合所有人.

Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.

有鉴于此，我的问题是如何在 C++ 中简洁、可移植和彻底地植入 mt19937 PRNG?

鉴于上述问题，一个很好的答案:

Given the issues above, a good answer:

必须完全播种 mt19937/mt19937_64.
不能仅仅依赖 std::random_device 或 time(NULL) 作为熵的来源.
不应依赖 Boost 或其他库.
应该适合少量的行，以便复制粘贴到答案中看起来不错.

Must fully seed the mt19937/mt19937_64.

Cannot rely solely on std::random_device or time(NULL) as a source of entropy.

Should not rely on Boost or other libaries.

Should fit in a small number of lines such that it would look nice copy-pasted into an answer.

想法

我目前的想法是 std::random_device 的输出可以与 time(NULL) 混合(可能通过 XOR)，值来自地址空间随机化，以及一个硬编码常量(可以在分发过程中设置)以获得最佳效果- 对熵的努力.

My current thought is that outputs from std::random_device can be mashed up (perhaps via XOR) with time(NULL), values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.

std::random_device::entropy() 没有很好地说明std::random_device可能会或不会做什么.

std::random_device::entropy() does not give a good indication of what std::random_device might or might not do.

推荐答案

我认为 std::random_device 的最大缺陷是，如果没有可用的 CSPRNG，它允许确定性回退.仅此一项就是不使用 std::random_device 播种 PRNG 的一个很好的理由，因为产生的字节可能是确定性的.不幸的是，它没有提供 API 来确定何时发生这种情况，或者请求失败而不是低质量的随机数.

I would argue the greatest flaw with std::random_device is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG using std::random_device, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.

也就是说，没有完全便携的解决方案:但是，有一种体面的、最小的方法.您可以使用 CSPRNG 周围的最小包装器(在下面定义为 sysrandom)来为 PRNG 设定种子.

That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as sysrandom below) to seed the PRNG.

您可以依赖 CryptGenRandom，一个 CSPRNG.例如，您可以使用以下代码:

You can rely on CryptGenRandom, a CSPRNG. For example, you may use the following code:

bool acquire_context(HCRYPTPROV *ctx) { if (!CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, 0)) { return CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, CRYPT_NEWKEYSET); } return true; } size_t sysrandom(void* dst, size_t dstlen) { HCRYPTPROV ctx; if (!acquire_context(&ctx)) { throw std::runtime_error("Unable to initialize Win32 crypt library."); } BYTE* buffer = reinterpret_cast<BYTE*>(dst); if(!CryptGenRandom(ctx, dstlen, buffer)) { throw std::runtime_error("Unable to generate random bytes."); } if (!CryptReleaseContext(ctx, 0)) { throw std::runtime_error("Unable to release Win32 crypt library."); } return dstlen; }

类Unix
<小时>
在许多类 Unix 系统上，您应该使用 /dev/urandom可能(尽管不保证在符合 POSIX 的系统上存在).

Unix-Like

On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).

size_t sysrandom(void* dst, size_t dstlen) { char* buffer = reinterpret_cast<char*>(dst); std::ifstream stream("/dev/urandom", std::ios_base::binary | std::ios_base::in); stream.read(buffer, dstlen); return dstlen; }

其他
<小时>
如果没有可用的 CSPRNG，您可以选择依赖 std::random_device.但是，如果可能的话，我会避免这种情况，因为各种编译器(最著名的是 MinGW)将其作为 PRNG(实际上，每次生成相同的序列以提醒人们它不是正确随机的).

Other

If no CSPRNG is available, you might choose to rely on std::random_device. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).

现在我们的片段开销最小，我们可以生成所需的随机熵位来为我们的 PRNG 做种子.该示例使用(显然不够)32 位作为 PRNG 的种子，您应该增加此值(这取决于您的 CSPRNG).

Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).

std::uint_least32_t seed; sysrandom(&seed, sizeof(seed)); std::mt19937 gen(seed);

对比提升
<小时>
快速浏览源代码.Boost 在 Windows 上使用 MS_DEF_PROV，这是 PROV_RSA_FULL 的提供程序类型.唯一缺少的是验证加密上下文，这可以通过 CRYPT_VERIFYCONTEXT 完成.在 *Nix 上，Boost 使用 /dev/urandom.IE，此解决方案可移植、经过充分测试且易于使用.

Comparison To Boost

We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses MS_DEF_PROV on Windows, which is the provider type for PROV_RSA_FULL. The only thing missing would be verifying the cryptographic context, which can be done with CRYPT_VERIFYCONTEXT. On *Nix, Boost uses /dev/urandom. IE, this solution is portable, well-tested, and easy-to-use.

如果你愿意为了安全而牺牲简洁性，getrandom 是 Linux 3.17 及更高版本以及最近的 Solaris 上的绝佳选择.getrandom 的行为与 /dev/urandom 相同，除了它在启动后内核尚未初始化其 CSPRNG 时会阻塞.以下代码段检测 Linux getrandom 是否可用，如果不可用，则回退到 /dev/urandom.

If you're willing to sacrifice succinctness for security, getrandom is an excellent choice on Linux 3.17 and above, and on recent Solaris. getrandom behaves identically to /dev/urandom, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linux getrandom is available, and if not falls back to /dev/urandom.

#if defined(__linux__) || defined(linux) || defined(__linux) # // Check the kernel version. `getrandom` is only Linux 3.17 and above. # include <linux/version.h> # if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0) # define HAVE_GETRANDOM # endif #endif // also requires glibc 2.25 for the libc wrapper #if defined(HAVE_GETRANDOM) # include <sys/syscall.h> # include <linux/random.h> size_t sysrandom(void* dst, size_t dstlen) { int bytes = syscall(SYS_getrandom, dst, dstlen, 0); if (bytes != dstlen) { throw std::runtime_error("Unable to read N bytes from CSPRNG."); } return dstlen; } #elif defined(_WIN32) // Windows sysrandom here. #else // POSIX sysrandom here. #endif

OpenBSD
<小时>
最后一个警告:现代 OpenBSD 没有 /dev/urandom.您应该改用 getentropy.

#if defined(__OpenBSD__) # define HAVE_GETENTROPY #endif #if defined(HAVE_GETENTROPY) # include <unistd.h> size_t sysrandom(void* dst, size_t dstlen) { int bytes = getentropy(dst, dstlen); if (bytes != dstlen) { throw std::runtime_error("Unable to read N bytes from CSPRNG."); } return dstlen; } #endif

其他想法
<小时>
如果您需要加密安全的随机字节，您可能应该将 fstream 替换为 POSIX 的无缓冲打开/读取/关闭.这是因为 basic_filebuf 和 FILE 都包含一个内部缓冲区，它将通过标准分配器分配(因此不会从内存中擦除).

Other Thoughts

If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both basic_filebuf and FILE contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).

这可以很容易地通过将 sysrandom 更改为:

This could easily be done by changing sysrandom to:

size_t sysrandom(void* dst, size_t dstlen) { int fd = open("/dev/urandom", O_RDONLY); if (fd == -1) { throw std::runtime_error("Unable to open /dev/urandom."); } if (read(fd, dst, dstlen) != dstlen) { close(fd); throw std::runtime_error("Unable to read N bytes from CSPRNG."); } close(fd); return dstlen; }

谢谢
<小时>
特别感谢 Ben Voigt 指出 FILE 使用缓冲读取，因此不应使用.

Thanks

Special thanks to Ben Voigt for pointing out FILE uses buffered reads, and therefore should not be used.

我还要感谢 Peter Cordes 提到 getrandom，以及 OpenBSD 缺少 /dev/urandom.

I would also like to thank Peter Cordes for mentioning getrandom, and OpenBSD's lack of /dev/urandom.

相关文章