原子的 C++ 内存屏障

2021-12-22 00:00:00 windows visual-c++ c++ memory-barriers

我是新手.任何人都可以提供以下内存屏障之间差异的简化解释吗?

I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers?

窗口 MemoryBarrier();
围栏_mm_mfence();
内联汇编asm volatile(""::"memory");
内在的_ReadWriteBarrier();

如果没有简单的解释，一些指向好文章或书籍的链接可能会帮助我弄清楚.到现在为止，我只使用其他人编写的对象来包装这些调用还可以，但我想比我目前的想法有更好的理解，这基本上是在幕后实现内存屏障的方法不止一种.

If there isn't a simple explanation some links to good articles or books would probably help me get it straight. Until now I was fine with just using objects written by others wrapping these calls but I'd like to have a better understanding than my current thinking which is basically along the lines of there is more than one way to implement memory barriers under the covers.

推荐答案

MemoryBarrier (MSVC) 和 _mm_mfence(被多个编译器支持)都提供了硬件内存栅栏，这可以防止处理器跨栅栏移动读取和写入.

Both MemoryBarrier (MSVC) and _mm_mfence (supported by several compilers) provide a hardware memory fence, which prevents the processor from moving reads and writes across the fence.

主要区别在于 MemoryBarrier 具有针对 x86、x64 和 IA64 的平台特定实现，而 _mm_mfence 专门使用 mfence SSE2 指令，因此它并不总是可用.

The main difference is that MemoryBarrier has platform specific implementations for x86, x64 and IA64, where as _mm_mfence specifically uses the mfence SSE2 instruction, so it's not always available.

在 x86 和 x64 上，MemoryBarrier 分别使用 xchg 和 lock 或 实现，我看到有人声称这比 mfence 更快.然而，我自己的基准测试结果恰恰相反，因此显然它在很大程度上取决于处理器型号.

On x86 and x64 MemoryBarrier is implemented with a xchg and lock or respectively, and I have seen some claims that this is faster than mfence. However my own benchmarks show the opposite, so apparently it's very much dependent on processor model.

另一个区别是 mfence 也可用于订购非临时存储/加载(movntq 等).

Another difference is that mfence can also be used for ordering non-temporal stores/loads (movntq etc).

GCC 也有 __sync_synchronize 生成硬件栅栏.

GCC also has __sync_synchronize which generates a hardware fence.

asm volatile ("" :: : "memory") 和 MSVC 中的 _ReadWriteBarrier 仅提供编译器级别的内存栅栏，防止编译器重新排序内存访问.这意味着处理器仍然可以自由地进行重新排序.

asm volatile ("" : : : "memory") in GCC and _ReadWriteBarrier in MSVC only provide a compiler level memory fence, preventing the compiler from reordering memory accesses. That means the processor is still free to do reordering.

编译器栅栏通常与具有某种隐式硬件栅栏的操作结合使用.例如.在 x86/x64 上，所有存储都有一个释放栅栏，加载有一个获取栅栏，因此在实现加载-获取和存储-释放时您只需要一个编译器栅栏.

Compiler fences are generally used in combination with operations that have some kind of implicit hardware fence. E.g. on x86/x64 all stores have a release fence and loads have an acquire fence, so you just need a compiler fence when implementing load-acquire and store-release.

相关文章