原子的 C++ 内存屏障
我是新手.任何人都可以提供以下内存屏障之间差异的简化解释吗?
I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers?
- 窗口
MemoryBarrier();
- 围栏
_mm_mfence();
- 内联汇编
asm volatile(""::"memory");
- 内在的
_ReadWriteBarrier();
如果没有简单的解释,一些指向好文章或书籍的链接可能会帮助我弄清楚.到现在为止,我只使用其他人编写的对象来包装这些调用还可以,但我想比我目前的想法有更好的理解,这基本上是在幕后实现内存屏障的方法不止一种.
If there isn't a simple explanation some links to good articles or books would probably help me get it straight. Until now I was fine with just using objects written by others wrapping these calls but I'd like to have a better understanding than my current thinking which is basically along the lines of there is more than one way to implement memory barriers under the covers.
推荐答案
MemoryBarrier
(MSVC) 和 _mm_mfence
(被多个编译器支持)都提供了硬件内存栅栏,这可以防止处理器跨栅栏移动读取和写入.
Both MemoryBarrier
(MSVC) and _mm_mfence
(supported by several compilers) provide a hardware memory fence, which prevents the processor from moving reads and writes across the fence.
主要区别在于 MemoryBarrier 具有针对 x86、x64 和 IA64 的平台特定实现,而 _mm_mfence 专门使用 mfence
SSE2 指令,因此它并不总是可用.
The main difference is that MemoryBarrier has platform specific implementations for x86, x64 and IA64, where as _mm_mfence specifically uses the mfence
SSE2 instruction, so it's not always available.
在 x86 和 x64 上,MemoryBarrier 分别使用 xchg
和 lock 或
实现,我看到有人声称这比 mfence 更快.然而,我自己的基准测试结果恰恰相反,因此显然它在很大程度上取决于处理器型号.
On x86 and x64 MemoryBarrier is implemented with a xchg
and lock or
respectively, and I have seen some claims that this is faster than mfence. However my own benchmarks show the opposite, so apparently it's very much dependent on processor model.
另一个区别是 mfence 也可用于订购非临时存储/加载(movntq
等).
Another difference is that mfence can also be used for ordering non-temporal stores/loads (movntq
etc).
GCC 也有 __sync_synchronize
生成硬件栅栏.
GCC also has __sync_synchronize
which generates a hardware fence.
asm volatile ("" :: : "memory")
和 MSVC 中的 _ReadWriteBarrier
仅提供编译器级别的内存栅栏,防止编译器重新排序内存访问.这意味着处理器仍然可以自由地进行重新排序.
asm volatile ("" : : : "memory")
in GCC and _ReadWriteBarrier
in MSVC only provide a compiler level memory fence, preventing the compiler from reordering memory accesses. That means the processor is still free to do reordering.
编译器栅栏通常与具有某种隐式硬件栅栏的操作结合使用.例如.在 x86/x64 上,所有存储都有一个释放栅栏,加载有一个获取栅栏,因此在实现加载-获取和存储-释放时您只需要一个编译器栅栏.
Compiler fences are generally used in combination with operations that have some kind of implicit hardware fence. E.g. on x86/x64 all stores have a release fence and loads have an acquire fence, so you just need a compiler fence when implementing load-acquire and store-release.
相关文章