为什么 volatile 局部变量的优化与 volatile 参数不同,为什么优化器会从后者生成无操作循环?

2022-01-23 00:00:00 pass-by-value optimization g++ c++ volatile

这是受此问题/答案和评论中随后讨论的启发:易失性"的定义是这种易失性还是GCC 有一些标准合规性问题?.根据其他人和我对应该发生的情况的解释,正如评论中所讨论的,我已将其提交给 GCC Bugzilla:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 其他相关回复也欢迎.

This was inspired by this question/answer and ensuing discussion in the comments: Is the definition of "volatile" this volatile, or is GCC having some standard compliancy problems?. Based on others' and my interpretation of what should happening, as discussed in comments, I've submitted it to GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 Other relevant responses are still welcome.

此外,该线程已经引起了这个问题:Does access a declared non-volatile object through a volatile reference/pointer confer volatile rules根据上述访问权限?

Also, that thread has since given rise to this question: Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?

我知道 volatile 不是大多数人认为的那样,而 是 实现定义的毒蛇巢.而且我当然不想在任何实际代码中使用以下构造.也就是说,我对这些示例中发生的事情完全感到困惑,所以我非常感谢任何说明.

I know volatile isn't what most people think it is and is an implementation-defined nest of vipers. And I certainly don't want to use the below constructs in any real code. That said, I'm totally baffled by what's going on in these examples, so I'd really appreciate any elucidation.

我的猜测是,这是由于对标准的高度细致的解释,或者(更有可能?)只是使用的优化器的极端情况.无论哪种方式,虽然比实际更学术,但我希望这被认为是有价值的分析,特别是考虑到 volatile 被误解的程度.更多的数据点 - 或者更有可能是反对它的点 - 一定是好的.

My guess is this is due to either highly nuanced interpretation of the Standard or (more likely?) just corner-cases for the optimiser used. Either way, while more academic than practical, I hope this is deemed valuable to analyse, especially given how typically misunderstood volatile is. Some more data points - or perhaps more likely, points against it - must be good.

鉴于此代码:

#include <cstddef>

void f(void *const p, std::size_t n)
{
    unsigned char *y = static_cast<unsigned char *>(p);
    volatile unsigned char const x = 42;
    // N.B. Yeah, const is weird, but it doesn't change anything

    while (n--) {
        *y++ = x;
    }
}

void g(void *const p, std::size_t n, volatile unsigned char const x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

void h(void *const p, std::size_t n, volatile unsigned char const &x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

int main(int, char **)
{
    int y[1000];
    f(&y, sizeof y);
    volatile unsigned char const x{99};
    g(&y, sizeof y, x);
    h(&y, sizeof y, x);
}

输出

g++ 来自 gcc (Debian 4.9.2-10) 4.9.2 (Debian stable aka Jessie) 和命令行 g++ -std=c++14 -O3 -S test.cppmain() 生成以下 ASM.Debian 5.4.0-6 版本(当前 unstable)产生等效代码,但我只是碰巧先运行了旧版本,所以这里是:

Output

g++ from gcc (Debian 4.9.2-10) 4.9.2 (Debian stable a.k.a. Jessie) with the command line g++ -std=c++14 -O3 -S test.cpp produces the below ASM for main(). Version Debian 5.4.0-6 (current unstable) produces equivalent code, but I just happened to run the older one first, so here it is:

main:
.LFB3:
    .cfi_startproc

# f()
    movb    $42, -1(%rsp)
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L21:
    subq    $1, %rax
    movzbl  -1(%rsp), %edx
    jne .L21

# x = 99
    movb    $99, -2(%rsp)
    movzbl  -2(%rsp), %eax

# g()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L22:
    subq    $1, %rax
    jne .L22

# h()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L23:
    subq    $1, %rax
    movzbl  -2(%rsp), %edx
    jne .L23

# return 0;
    xorl    %eax, %eax
    ret
    .cfi_endproc

分析

所有 3 个函数都是内联的,并且分配 volatile 局部变量的两个函数都在堆栈上这样做,原因很明显.但这是他们唯一分享的东西......

Analysis

All 3 functions are inlined, and both that allocate volatile local variables do so on the stack for fairly obvious reasons. But that's about the only thing they share...

  • f() 确保在每次迭代时从 x 读取,大概是由于它的 volatile - 但只是将结果转储到 edx,可能是因为目标 y 未声明为 volatile 并且从未被读取,这意味着对其进行了更改可以在 as-if 规则下被否决.好的,有道理.

  • f() ensures to read from x on each iteration, presumably due to its volatile - but just dumps the result to edx, presumably because the destination y isn't declared volatile and is never read, meaning changes to it can be nixed under the as-if rule. OK, makes sense.

  • 嗯,我的意思是……有点.就像,不是真的,因为 volatile 真的是用于硬件寄存器,显然本地值不能是其中之一 - 并且不能在 volatile 中进行修改除非它的地址被传递出去......它不是.看,volatile 局部值没有多大意义.但是 C++ 允许我们声明它们并尝试对它们做一些事情.于是,我们一如既往地迷茫,跌跌撞撞地向前走.
  • Well, I mean... kinda. Like, not really, because volatile is really for hardware registers, and clearly a local value can't be one of those - and can't otherwise be modified in a volatile way unless its address is passed out... which it's not. Look, there's just not a lot of sense to be had out of volatile local values. But C++ lets us declare them and tries to do something with them. And so, confused as always, we stumble onwards.

g():What. 通过将 volatile 源移动到 pass-by-value 参数,它仍然只是另一个局部变量,GCC 以某种方式决定它不是或 less volatile,因此它不需要每次迭代都读取它......但它仍然执行循环,尽管它的主体现在什么都不做.

g(): What. By moving the volatile source into a pass-by-value parameter, which is still just another local variable, GCC somehow decides it's not or less volatile, and so it doesn't need to read it every iteration... but it still carries out the loop, despite its body now doing nothing.

h():通过将传递的volatile作为pass-by-reference,与f() 被恢复,所以循环执行 volatile 读取.

h(): By taking the passed volatile as pass-by-reference, the same effective behaviour as f() is restored, so the loop does volatile reads.

  • 由于上面针对 f() 列出的原因,仅这个案例对我来说实际上是有实际意义的.详细说明:想象 x 指的是一个硬件寄存器,每次读取都有副作用.您不会想跳过其中任何一个.
  • This case alone actually makes practical sense to me, for reasons outlined above against f(). To elaborate: Imagine x refers to a hardware register, of which every read has side-effects. You wouldn't want to skip any of those.

添加 #define volatile/**/ 会导致 main() 正如您所期望的那样成为无操作.因此,当存在时,即使在局部变量 volatile 上也会做一些事情......我只是不知道 what 在 g().那里到底发生了什么?

Adding #define volatile /**/ leads to main() being a no-op, as you'd expect. So, when present, even on a local variable volatile does do something... I just have no idea what in the case of g(). What on Earth is going on there?

  • 为什么在体内声明的局部值会与按值参数产生不同的结果,而前者会让读取被优化掉?两者都被声明为 volatile.既没有传递地址 - 也没有 static 地址,排除了任何内联 ASM POKEry - 因此它们永远不能被函数修改.编译器可以看到每个都是常量,不需要重新读取,并且 volatile 只是不正确 -
    • 所以 (A) 是不是 允许 在这样的限制下被省略?(表现 as-if 他们没有被声明 volatile)-
    • 和 (B) 为什么只有一个被省略?某些 volatile 局部变量是否比其他变量更 volatile?
    • Why does a local value declared in-body produce different results from a by-value parameter, with the former letting reads be optimised away? Both are declared volatile. Neither have an address passed out - and don't have a static address, ruling out any inline-ASM POKEry - so they can never be modified outwith the function. The compiler can see that each is constant, need never be re-read, and volatile just ain't true -
      • so (A) is either allowed to be elided under such constraints? (acting as-if they weren't declared volatile) -
      • and (B) why does only one get elided? Are some volatile local variables more volatile than others?

      由于优化分析的顺序等原因,这是一个奇怪的极端情况吗?由于代码是一个愚蠢的思想实验,我不会为此责备 GCC,但很高兴知道这一点.(或者 g() 是人们多年来梦寐以求的手动计时循环吗?)如果我们断定这与标准无关,我会将其移至他们的 Bugzilla 以供参考.

      Is this a weird corner case due to order of optimising analyses or such? As the code is a daft thought-experiment, I wouldn't chastise GCC for this, but it'd be good to know for sure. (Or is g() the manual timing loop people have dreamt of all these years?) If we conclude there's no Standard bearing on any of this, I'll move it to their Bugzilla just for their information.

      当然,从实际的角度来看,更重要的问题是,虽然我不希望这掩盖编译器极客的潜力...根据标准,如果有的话,哪个是明确定义/正确的?

      And of course, the more important question from a practical perspective, though I don't want that to overshadow the potential for compiler geekery... Which, if any of these, are well-defined/correct according to the Standard?

      推荐答案

      对于 f:GCC 消除了非易失性存储(但不是负载,如果源位置是内存映射的硬件寄存器,则会产生副作用).这里真的没有什么令人惊讶的.

      For f: GCC eliminates the non-volatile stores (but not the loads, which can have side-effects if the source location is a memory mapped hardware register). There is really nothing surprising here.

      对于 g:由于 x86_64 ABI 参数 xg 分配在寄存器中(即rdx),在内存中没有位置.读取通用寄存器不会产生任何可观察到的副作用,因此会消除死读.

      For g: Because of the x86_64 ABI the parameter x of g is allocated in a register (i.e. rdx) and does not have a location in memory. Reading a general purpose register does not have any observable side effects so the dead read gets eliminted.

相关文章