1.0 是 std::generate_canonical 的有效输出吗?

2021-12-21 00:00:00 random c++ c++11

我一直认为随机数会介于 0 和 1 之间,没有 1,即它们是来自半开区间 [0,1) 的数字.std::generate_canonical 的 cppreference.com 上的文档 证实了这一点.

I always thought random numbers would lie between zero and one, without 1, i.e. they are numbers from the half-open interval [0,1). The documention on cppreference.com of std::generate_canonical confirms this.

但是,当我运行以下程序时:

However, when I run the following program:

#include <iostream>
#include <limits>
#include <random>

int main()
{
    std::mt19937 rng;

    std::seed_seq sequence{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    rng.seed(sequence);
    rng.discard(12 * 629143 + 6);

    float random = std::generate_canonical<float,
                   std::numeric_limits<float>::digits>(rng);

    if (random == 1.0f)
    {
        std::cout << "Bug!
";
    }

    return 0;
}

它给了我以下输出:

Bug!

即它为我生成了一个完美的 1,这会导致我的 MC 集成出现问题.这是有效的行为还是我这边有错误?这给出了与 G++ 4.7.3 相同的输出

i.e. it generates me a perfect 1, which causes problems in my MC integration. Is that valid behavior or is there an error on my side? This gives the same output with G++ 4.7.3

g++ -std=c++11 test.c && ./a.out

和clang 3.3

clang++ -stdlib=libc++ -std=c++11 test.c && ./a.out

如果这是正确的行为,我该如何避免 1?

If this is correct behavior, how can I avoid 1?

编辑 1:来自 git 的 G++ 似乎遇到了同样的问题.我在

Edit 1: G++ from git seems to suffer from the same problem. I am on

commit baf369d7a57fb4d0d5897b02549c3517bb8800fd
Date:   Mon Sep 1 08:26:51 2014 +0000

并使用 ~/temp/prefix/bin/c++ -std=c++11 -Wl,-rpath,/home/cschwan/temp/prefix/lib64 test.c && 编译./a.out 给出相同的输出,ldd 产生

and compiling with ~/temp/prefix/bin/c++ -std=c++11 -Wl,-rpath,/home/cschwan/temp/prefix/lib64 test.c && ./a.out gives the same output, ldd yields

linux-vdso.so.1 (0x00007fff39d0d000)
libstdc++.so.6 => /home/cschwan/temp/prefix/lib64/libstdc++.so.6 (0x00007f123d785000)
libm.so.6 => /lib64/libm.so.6 (0x000000317ea00000)
libgcc_s.so.1 => /home/cschwan/temp/prefix/lib64/libgcc_s.so.1 (0x00007f123d54e000)
libc.so.6 => /lib64/libc.so.6 (0x000000317e600000)
/lib64/ld-linux-x86-64.so.2 (0x000000317e200000)

编辑 2:我在此处报告了该行为:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63176

Edit 2: I reported the behavior here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63176

编辑 3:clang 团队似乎意识到了这个问题:http://llvm.org/bugs/show_bug.cgi?id=18767

Edit 3: The clang team seems to be aware of the problem: http://llvm.org/bugs/show_bug.cgi?id=18767

推荐答案

问题出在从 std::mt19937 (std::uint_fast32_t) 的 codomain 映射浮动;如果当前的 IEEE754 舍入模式不是舍入到负无穷大(注意默认值是舍入),那么在发生精度损失时,标准描述的算法会给出不正确的结果(与其对算法输出的描述不一致)-到最近).

The problem is in mapping from the codomain of std::mt19937 (std::uint_fast32_t) to float; the algorithm described by the standard gives incorrect results (inconsistent with its description of the output of the algorithm) when loss of precision occurs if the current IEEE754 rounding mode is anything other than round-to-negative-infinity (note that the default is round-to-nearest).

带有种子的 mt19937 的第 7549723 次输出是 4294967257 (0xffffffd9u),当四舍五入为 32 位浮点数时给出 0x1p+32,它等于最大值mt19937, 4294967295 (0xffffffffu) 的值,同时四舍五入为 32 位浮点数.

The 7549723rd output of mt19937 with your seed is 4294967257 (0xffffffd9u), which when rounded to 32-bit float gives 0x1p+32, which is equal to the max value of mt19937, 4294967295 (0xffffffffu) when that is also rounded to 32-bit float.

如果要指定从 URNG 的输出转换为 generate_canonicalRealType 时,标准可以确保正确的行为,四舍五入将向负数执行无限;在这种情况下,这将给出正确的结果.作为 QOI,libstdc++ 做出这个改变会很好.

The standard could ensure correct behavior if it were to specify that when converting from the output of the URNG to the RealType of generate_canonical, rounding is to be performed towards negative infinity; this would give a correct result in this case. As QOI, it would be good for libstdc++ to make this change.

随着这个变化,1.0 将不再生成;取而代之的是 0 < 的边界值 0x1.fffffep-NN <= 8 将更频繁地生成(每个 N 大约 2^(8 - N - 32),具体取决于 MT19937 的实际分布).

With this change, 1.0 will no longer be generated; instead the boundary values 0x1.fffffep-N for 0 < N <= 8 will be generated more often (approximately 2^(8 - N - 32) per N, depending on the actual distribution of MT19937).

我建议不要直接将 floatstd::generate_canonical 一起使用;而是在 double 中生成数字,然后向负无穷大舍入:

I would recommend to not use float with std::generate_canonical directly; rather generate the number in double and then round towards negative infinity:

    double rd = std::generate_canonical<double,
        std::numeric_limits<float>::digits>(rng);
    float rf = rd;
    if (rf > rd) {
      rf = std::nextafter(rf, -std::numeric_limits<float>::infinity());
    }

std::uniform_real_distribution 也会出现这个问题;解决方案是相同的,在 double 上专门化分布,并将结果向 float 中的负无穷大舍入.

This problem can also occur with std::uniform_real_distribution<float>; the solution is the same, to specialize the distribution on double and round the result towards negative infinity in float.

相关文章