并发:C++11 内存模型中的原子性和易失性

一个全局变量在 2 个不同内核上的 2 个并发运行线程之间共享.线程写入和读取变量.对于原子变量,一个线程可以读取一个陈旧的值吗?每个内核可能在其缓存中都有一个共享变量的值,当一个线程写入缓存中的副本时,不同内核上的另一个线程可能会从其自己的缓存中读取过时的值.或者编译器执行强内存排序以从另一个缓存中读取最新值?c++11 标准库有 std::atomic 支持.这与 volatile 关键字有何不同?在上述场景中,volatile 和 atomic 类型的行为有何不同?

A global variable is shared across 2 concurrently running threads on 2 different cores. The threads writes to and read from the variables. For the atomic variable can one thread read a stale value? Each core might have a value of the shared variable in its cache and when one threads writes to its copy in a cache the other thread on a different core might read stale value from its own cache. Or the compiler does strong memory ordering to read the latest value from the other cache? The c++11 standard library has std::atomic support. How this is different from the volatile keyword? How volatile and atomic types will behave differently in the above scenario?

推荐答案

首先,volatile 并不意味着原子访问.它专为诸如内存映射 I/O 和信号处理之类的事情而设计.volatilestd::atomic 一起使用时完全没有必要,除非您的平台另有说明,否则 volatile 与原子访问或内存排序无关线程之间.

Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic, and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

如果您有一个在线程之间共享的全局变量,例如:

If you have a global variable which is shared between threads, such as:

std::atomic<int> ai;

那么可见性和排序约束取决于您用于操作的内存排序参数,以及锁、线程和访问其他原子变量的同步效果.

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.

在没有任何额外同步的情况下,如果一个线程向 ai 写入一个值,则无法保证另一个线程在任何给定时间段内都能看到该值.该标准规定它应该在合理的时间段内"可见,但任何给定的访问都可能返回一个陈旧的值.

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period. The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.

std::memory_order_seq_cst 的默认内存排序为所有变量的所有 std::memory_order_seq_cst 操作提供了一个全局总顺序.这并不意味着您无法获得过时的值,但这确实意味着您获得的值决定了您的操作在整个顺序中的位置.

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

如果您有 2 个共享变量 xy,初始为零,并且有一个线程向 x 写入 1,另一个向 x 写入 2y,那么读取两者的第三个线程可能会看到 (0,0)、(1,0)、(0,2) 或 (1,2),因为两者之间没有排序约束操作,因此操作可以在全局顺序中以任何顺序出现.

If you have 2 shared variables x and y, initially zero, and have one thread write 1 to x and another write 2 to y, then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.

如果两个写入都来自同一个线程,则 x=1y=2 之前,读取线程在 y 之前读取 ycode>x then (0,2) 不再是一个有效的选项,因为读取 y==2 意味着更早的写入 x 是可见的.其他 3 对 (0,0)、(1,0) 和 (1,2) 仍然是可能的,这取决于 2 个读取与 2 个写入的交错方式.

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible. The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.

如果您使用其他内存排序,例如 std::memory_order_relaxedstd::memory_order_acquire,那么约束会进一步放宽,并且单个全局排序不再适用.如果没有额外的同步,线程甚至不必就两个存储的顺序达成一致以分隔变量.

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies. Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.

保证您拥有最新"的唯一方法value 是使用读-修改-写操作,例如 exchange()compare_exchange_strong()fetch_add().读-修改-写操作有一个额外的限制,即它们总是对最新的"数据进行操作.值,因此一系列线程的一系列 ai.fetch_add(1) 操作将返回一个没有重复或间隙的值序列.在没有额外约束的情况下,仍然无法保证哪些线程会看到哪些值.特别要注意的是,使用 RMW 操作不会强制其他线程的更改更快地变得可见,这只是意味着如果 RMW 没有看到这些更改,那么所有线程必须同意它们在原子变量的修改顺序中比 RMW 操作晚.来自不同线程的存储仍然可以延迟任意时间,这取决于 CPU 实际何时将存储发布到内存(而不仅仅是它自己的存储缓冲区),物理执行线程的 CPU 相距多远(在多处理器系统的情况下),以及缓存一致性协议的详细信息.

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though. In particular, it is important to note that the use of an RMW operation does not force changes from other threads to become visible any quicker, it just means that if the changes are not seen by the RMW then all threads must agree that they are later in the modification order of that atomic variable than the RMW operation. Stores from different threads can still be delayed by arbitrary amounts of time, depending on when the CPU actually issues the store to memory (rather than just its own store buffer), physically how far apart the CPUs executing the threads are (in the case of a multi-processor system), and the details of the cache coherency protocol.

使用原子操作是一个复杂的话题.我建议您阅读大量背景资料,并在使用原子编写生产代码之前检查已发布的代码.在大多数情况下,编写使用锁的代码更容易,而且效率不会明显降低.

Working with atomic operations is a complex topic. I suggest you read a lot of background material, and examine published code before writing production code with atomics. In most cases it is easier to write code that uses locks, and not noticeably less efficient.

相关文章