我可以在多核 x86 CPU 上强制缓存一致性吗?

2022-01-06 00:00:00 multithreading c++ multicore cpu-cache x86

前一周，我写了一个小线程类和一个单向消息管道来允许线程之间的通信(显然，每个线程两个管道，用于双向通信).在我的 Athlon 64 X2 上一切正常，但我想知道如果两个线程都在查看同一个变量并且每个内核上该变量的本地缓存值不同步，我是否会遇到任何问题.

The other week, I wrote a little thread class and a one-way message pipe to allow communication between threads (two pipes per thread, obviously, for bidirectional communication). Everything worked fine on my Athlon 64 X2, but I was wondering if I'd run into any problems if both threads were looking at the same variable and the local cached value for this variable on each core was out of sync.

我知道 volatile 关键字会强制一个变量从内存中刷新，但是在多核 x86 处理器上有没有办法强制所有内核的缓存同步?这是我需要担心的事情，还是 volatile 和轻量级锁定机制的正确使用(我使用 _InterlockedExchange 来设置我的 volatile 管道变量)处理所有我想写无锁"的情况多核 x86 CPU 的代码?

I know the volatile keyword will force a variable to refresh from memory, but is there a way on multicore x86 processors to force the caches of all cores to synchronize? Is this something I need to worry about, or will volatile and proper use of lightweight locking mechanisms (I was using _InterlockedExchange to set my volatile pipe variables) handle all cases where I want to write "lock free" code for multicore x86 CPUs?

我已经知道并使用过临界区、互斥体、事件等.我主要想知道是否有 x86 内在函数我不知道哪种强制或可用于强制缓存一致性.

I'm already aware of and have used Critical Sections, Mutexes, Events, and so on. I'm mostly wondering if there are x86 intrinsics that I'm not aware of which force or can be used to enforce cache coherency.

推荐答案

volatile 只会强制你的代码重新读取值，它无法控制从哪里读取值.如果您的代码最近读取了该值，那么它可能会在缓存中，在这种情况下， volatile 将强制它从缓存中重新读取，而不是从内存中重新读取.

volatile only forces your code to re-read the value, it cannot control where the value is read from. If the value was recently read by your code then it will probably be in cache, in which case volatile will force it to be re-read from cache, NOT from memory.

x86 中没有很多缓存一致性指令.有像 prefetchnta 这样的预取指令，但是没有影响内存排序语义.它过去是通过将值带入 L1 缓存而不污染 L2 来实现的，但对于具有大型共享inclusive L3 缓存的现代英特尔设计而言，事情变得更加复杂.

There are not a lot of cache coherency instructions in x86. There are prefetch instructions like prefetchnta, but that doesn't affect the memory-ordering semantics. It used to be implemented by bringing the value to L1 cache without polluting L2, but things are more complicated for modern Intel designs with a large shared inclusive L3 cache.

x86 CPU 使用 MESI 协议的变体(MESIF 为 Intel，MOESI 为 AMD)保持它们的缓存相互一致(包括不同内核的私有 L1 缓存).想要写入缓存行的内核必须强制其他内核使其副本无效，然后才能将自己的副本从共享状态更改为修改状态.

x86 CPUs use a variation on the MESI protocol (MESIF for Intel, MOESI for AMD) to keep their caches coherent with each other (including the private L1 caches of different cores). A core that wants to write a cache line has to force other cores to invalidate their copy of it before it can change its own copy from Shared to Modified state.

您不需要任何栅栏指令(如 MFENCE)来在一个线程中生成数据并在 x86 上的另一个线程中使用它，因为 x86 加载/存储具有获取/释放语义内置.您确实需要 MFENCE(完全屏障)来获得顺序一致性.(此答案的先前版本表明需要 clflush，这是不正确的).

You don't need any fence instructions (like MFENCE) to produce data in one thread and consume it in another on x86, because x86 loads/stores have acquire/release semantics built-in. You do need MFENCE (full barrier) to get sequential consistency. (A previous version of this answer suggested that clflush was needed, which is incorrect).

您确实需要防止编译时重新排序，因为C++ 的内存模型是弱排序的.volatile 是一种旧的、糟糕的方法；C++11 std::atomic 是编写无锁代码的更好方法.

You do need to prevent compile-time reordering, because C++'s memory model is weakly-ordered. volatile is an old, bad way to do this; C++11 std::atomic is a much better way to write lock-free code.

相关文章