C++0x 内存模型和推测加载/存储

所以我正在阅读作为即将到来的 C++0x 标准一部分的内存模型.但是,我对允许编译器执行的操作的一些限制感到有些困惑,特别是关于推测加载和存储的限制.

So I was reading about the memory model that is part of the upcoming C++0x standard. However, I'm a bit confused about some of the restrictions for what the compiler is allowed to do, specifically about speculative loads and stores.

首先,一些相关的东西:

To start with, some of the relevant stuff:

Hans Boehm 关于线程和 C++0x 中的内存模型的页面

Boehm,线程不能作为库实现"

Boehm 和 Adve,C++ 并发内存模型的基础"

萨特,棱镜:原则-基于 Microsoft 本地代码平台的顺序内存模型", N2197

Boehm, "并发内存模型编译器后果", N2338

现在,基本思想本质上是无数据竞争程序的顺序一致性",这似乎是在易于编程与允许编译器和硬件机会进行优化之间的不错折衷.如果不同线程对同一内存位置的两次访问没有排序,则定义为发生数据竞争,其中至少有一个存储到该内存位置,并且其中至少一个不是同步操作.这意味着对共享数据的所有读/写访问都必须通过某种同步机制,例如互斥锁或对原子变量的操作(好吧,可以在原子变量上操作宽松的内存顺序仅供专家使用em>,但默认提供顺序一致性).

Now, the basic idea is essentially "Sequential Consistency for Data-Race-Free Programs", which seems to be a decent compromise between ease of programming and allowing the compiler and hardware opportunities to optimize. A data race is defined to occur if two accesses to the same memory location by different threads are not ordered, at least one of them stores to the memory location, and at least one of them is not a synchronization action. It implies that all read/write access to shared data must be via some synchronization mechanism, such as mutexes or operations on atomic variables (well, it is possible to operate on the atomic variables with relaxed memory ordering for experts only, but the default provides for sequential consistency).

有鉴于此,我对普通共享变量上关于虚假或推测性加载/存储的限制感到困惑.例如,在 N2338 中我们有这样的例子

In light of this, I'm confused about the restrictions about spurious or speculative loads/stores on ordinary shared variables. For instance, in N2338 we have the example

switch (y) {
    case 0: x = 17; w = 1; break;
    case 1: x = 17; w = 3; break;
    case 2: w = 9; break;
    case 3: x = 17; w = 1; break;
    case 4: x = 17; w = 3; break;
    case 5: x = 17; w = 9; break;
    default: x = 17; w = 42; break;
}

不允许编译器转化为

tmp = x; x = 17;
switch (y) {
    case 0: w = 1; break;
    case 1: w = 3; break;
    case 2: x = tmp; w = 9; break;
    case 3: w = 1; break;
    case 4: w = 3; break;
    case 5: w = 9; break;
    default: w = 42; break;
}

因为如果 y == 2 存在对 x 的虚假写入,如果另一个线程同时更新 x,这可能是一个问题.但是,为什么这是一个问题?这是一场数据竞赛,无论如何都是被禁止的;在这种情况下,编译器只会通过写入 x 两次使情况变得更糟,但即使是一次写入也足以进行数据竞争,不是吗?IE.一个合适的 C++0x 程序需要同步对 x 的访问,在这种情况下,将不再有数据竞争,虚假存储也不会成为问题?

since if y == 2 there is a spurious write to x which could be a problem if another thread were concurrently updating x. But, why is this a problem? This a data race, which is prohibited anyway; in this case, the compiler just makes it worse by writing to x twice, but even a single write would be enough for a data race, no? I.e. a proper C++0x program would need to synchronize access to x, in which case there would no longer be data race, and the spurious store wouldn't be a problem either?

我对 N2197 中的示例 3.1.3 和其他一些示例也同样感到困惑,但也许对上述问题的解释也能解释这一点.

I'm similarly confused about Example 3.1.3 in N2197 and some of the other examples as well, but maybe an explanation for the above issue would explain that too.

答案:

推测性存储存在问题的原因在于,在上面的 switch 语句示例中,程序员可能选择有条件地获取保护 x 的锁,仅当 y != 2 时.因此推测性存储可能会引入数据竞争原始代码中不存在,因此禁止转换.同样的论点也适用于 N2197 中的示例 3.1.3.

The reason why speculative stores are a problem is that in the switch statement example above, the programmer might have elected to conditionally acquire the lock protecting x only if y != 2. Hence the speculative store might introduce a data race that was not there in the original code, and the transformation is thus forbidden. The same argument applies to Example 3.1.3 in N2197 as well.

推荐答案

我不熟悉你提到的所有东西,但请注意,在 y==2 的情况下,在代码的第一位,x 是根本没有写入(或读取,就此而言).在第二位代码中,它被写入了两次.这不仅仅是写一次和写两次(至少,在现有的线程模型中,如 pthreads).此外,存储一个根本不会存储的值比仅存储一次与存储两次有更大的区别.由于这两个原因,您不希望编译器只是用 tmp = x; 替换无操作;x = 17;x = tmp;.

I'm not familiar with all the stuff you refer to, but notice that in the y==2 case, in the first bit of code, x is not written to at all (or read, for that matter). In the second bit of code, it is written twice. This is more of a difference than just writing once vs. writing twice (at least, it is in existing threading models such as pthreads). Also, storing a value which would not otherwise be stored at all is more of a difference than just storing once vs. storing twice. For both these reasons, you don't want compilers just replacing a no-op with tmp = x; x = 17; x = tmp;.

假设线程 A 想要假设没有其他线程修改 x.允许它期望如果 y 为 2,并且它向 x 写入一个值,然后读回它,它会取回它写入的值是合理的.但是,如果线程 B 正在并发执行您的第二位代码,则线程 A 可以写入 x 并稍后读取它,并取回原始值,因为线程 B 在写入之前"保存并在之后"恢复.或者它可以返回 17,因为线程 B 在写入之后"存储了 17,并且在线程 A 读取之后"再次存储了 tmp.线程 A 可以做任何它喜欢的同步,但这无济于事,因为线程 B 没有同步.它不同步的原因(在 y==2 的情况下)是它没有使用 x.因此,特定代码位是否使用 x"的概念对于线程模型很重要,这意味着不能允许编译器在不应该"时更改代码以使用 x.

Suppose thread A wants to assume that no other thread modifies x. It's reasonable to want it to be allowed to expect that if y is 2, and it writes a value to x, then reads it back, it will get back the value it has written. But if thread B is concurrently executing your second bit of code, then thread A could write to x and later read it, and get back the original value, because thread B saved "before" the write and restored "after" it. Or it could get back 17, because thread B stored 17 "after" the write, and stored tmp back again "after" thread A reads. Thread A can do whatever synchronisation it likes, and it won't help, because thread B isn't synchronised. The reason it's not synchronised (in the y==2 case) is that it's not using x. So the concept of whether a particular bit of code "uses x" is important to the threading model, which means compilers can't be allowed to change code to use x when it "shouldn't".

简而言之,如果您提议的转换被允许,引入虚假写入,那么永远不可能分析一点代码并得出它不会修改 x(或任何其他内存位置)的结论.因此,有许多方便的习惯用法是不可能的,例如在没有同步的情况下在线程之间共享不可变数据.

In short, if the transformation you propose were allowed, introducing a spurious write, then it would never be possible to analyse a bit of code and conclude that it does not modify x (or any other memory location). There are a number of convenient idioms which would therefore be impossible, such as sharing immutable data between threads without synchronisation.

所以,虽然我不熟悉 C++0x 对数据竞争"的定义,但我假设它包括一些允许程序员假设对象未被写入的条件,并且这种转换将违反那些条件.我推测如果y==2,那么你的原始代码,连同并发代码:x = 42;x = 1;z = x 在另一个线程中,没有定义为数据竞争.或者至少如果是数据竞争,它不会允许 z 以 17 或 42 结束.

So, although I'm not familiar with C++0x's definition of "data race", I assume that it includes some conditions where programmers are allowed to assume that an object is not written to, and that this transformation would violate those conditions. I speculate that if y==2, then your original code, together with concurrent code: x = 42; x = 1; z = x in another thread, is not defined to be a data race. Or at least if it is a data race, it's not one which permits z to end up with value either 17, or 42.

考虑到在这个程序中,y 中的值 2 可能用于指示还有其他线程正在运行:不要修改 x,因为我们这里没有同步,因此会引入数据竞争".可能根本没有同步的原因是在所有其他 y 情况下,没有其他线程可以访问 x.在我看来,C++0x 想要支持这样的代码是合理的:

Consider that in this program, the value 2 in y might be used to indicate, "there are other threads running: don't modify x, because we aren't synchronised here, so that would introduce a data race". Perhaps the reason there's no synchronisation at all, is that in all other cases of y, there are no other threads running with access to x. It seems reasonable to me that C++0x would want to support code like this:

if (single_threaded) {
    x = 17;
} else {
    sendMessageThatSafelySetsXTo(17);
}

显然,您不希望将其转换为:

Clearly then, you don't want that transformed to:

tmp = x;
x = 17;
if (!single_threaded) {
    x = tmp;
    sendMessageThatSafelySetsXTo(17);
}

这与您的示例中的转换基本相同,但只有 2 种情况,而不是足以使它看起来像一个很好的代码大小优化.

Which is basically the same transformation as in your example, but with only 2 cases, instead of there being enough to make it look like a good code-size optimisation.

相关文章