对 STL 容器的安全并行只读访问

2022-01-24 00:00:00 multithreading containers c++ c++11 stl

我想从并行运行线程访问基于 STL 的容器只读.不使用任何用户实现的锁定.以下代码的基础是 C++11，并正确实现了该标准.

I want access a STL based container read-only from parallel running threads. Without using any user implemented locking. The base of the following code is C++11 with a proper implementation of the standard.

http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
http://www.sgi.com/tech/stl/thread_safety.html
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html
http://www.open-std.org/jtc1/sc22/wg21/ (当前草案或N3337，本质上是 C++11，有小错误和错别字更正)

http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
http://www.sgi.com/tech/stl/thread_safety.html
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html
http://www.open-std.org/jtc1/sc22/wg21/ (current draft or N3337, which is essentially C++11 with minor errors and typos corrected)

23.2.2 容器数据竞争 [container.requirements.dataraces]

23.2.2 Container data races [container.requirements.dataraces]

为了避免数据竞争 (17.6.5.9)，实现应将以下函数视为 const:begin、end、rbegin、撕裂，前，后，数据，查找，下界，上界，相等范围，在并且，除了在关联或无序关联容器中，运算符[].

For purposes of avoiding data races (17.6.5.9), implementations shall consider the following functions to be const: begin, end, rbegin, rend, front, back, data, find, lower_bound, upper_bound, equal_range, at and, except in associative or unordered associative containers, operator[].

尽管有 (17.6.5.9)，但仍需要实现当包含对象的内容在同一序列中的不同元素，除了向量<bool>，是同时修改.

Notwithstanding (17.6.5.9), implementations are required to avoid data races when the contents of the con- tained object in different elements in the same sequence, excepting vector<bool>, are modified concurrently.

[注意:对于一个向量;x 尺寸更大大于一，x[1] = 5 和 *x.begin() = 10 可以同时执行没有数据竞争，但执行了 x[0] = 5 和 *x.begin() = 10同时可能导致数据竞争.作为一般的例外规则，对于向量<bool>y, y[0] = true 可能与 y[1] 竞争=真.――尾注]

[ Note: For a vector<int> x with a size greater than one, x[1] = 5 and *x.begin() = 10 can be executed concurrently without a data race, but x[0] = 5 and *x.begin() = 10 executed concurrently may result in a data race. As an exception to the general rule, for a vector<bool> y, y[0] = true may race with y[1] = true. ― end note ]

和

17.6.5.9 避免数据竞争 [res.on.data.races] 1 本节规定了实现应满足的要求以防止数据竞争比赛(1.10).每个标准库函数都应满足每个要求，除非另有规定.实施可能会阻止除了下面指定的情况之外的数据竞争.

17.6.5.9 Data race avoidance [res.on.data.races] 1 This section specifies requirements that implementations shall meet to prevent data races (1.10). Every standard library function shall meet each requirement unless otherwise specified. Implementations may prevent data races in cases other than those specified below.

2 C++ 标准库函数不得直接或间接访问对象(1.10) 可由当前线程以外的线程访问，除非对象通过函数的直接或间接访问参数，包括这个.

2 A C++ standard library function shall not directly or indirectly access objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s argu- ments, including this.

3 C++ 标准库函数应不直接或间接修改线程可访问的对象(1.10)除了当前线程之外，除非直接访问对象或间接通过函数的非常量参数，包括这个.

3 A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.

4 [ 注意:例如，这意味着实现不能将静态对象用于内部目的而无需同步因为即使在不支持的程序中，它也可能导致数据竞争在线程之间显式共享对象.――尾注]

4 [ Note: This means, for example, that implementations can’t use a static object for internal purposes without synchronization because it could cause a data race even in programs that do not explicitly share objects between threads. ― end note ]

5 C++ 标准库函数不得间接访问对象可通过其参数或通过其容器的元素访问参数，除非通过调用其规范所需的函数在那些容器元素上.

5 A C++ standard library function shall not access objects indirectly accessible via its arguments or via elements of its container arguments except by invoking functions required by its specification on those container elements.

6 获得的迭代器上的操作调用标准库容器或字符串成员函数可能
访问底层容器，但不得修改它.[注:在特别是使迭代器无效的容器操作冲突对与该容器关联的迭代器进行操作.- 结尾注意]

6 Operations on iterators obtained by calling a standard library container or string member function may
access the underlying container, but shall not modify it. [ Note: In particular, container operations that invalidate iterators conflict with operations on iterators associated with that container. ― end note ]

7 实现可以在它们之间共享它们自己的内部对象如果对象对用户不可见且受保护，则线程反对数据竞争.

7 Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races.

8 除非另有说明，C++ 标准库功能应仅在当前执行所有操作如果这些操作具有可见的效果(1.10)，则线程用户.

8 Unless otherwise specified, C++ standard library functions shall perform all operations solely within the current thread if those operations have effects that are visible (1.10) to users.

9 [注意:这允许实现并行化操作如果没有明显的副作用.――尾注]

9 [ Note: This allows implementations to parallelize operations if there are no visible side effects. ― end note ]

结论
容器不是线程安全的！但是从多个并行线程对容器调用 const 函数是安全的.因此可以在没有锁定的情况下从并行线程执行只读操作.我说的对吗?

Conclusion
Containers are not thread safe! But it is safe to call const functions on containers from multiple parallel threads. So it is possible to do read-only operations from parallel threads without locking. Am I right?

我假设他们不存在任何错误的实现，并且 C++11 标准的每个实现都是正确的.

I pretend that their doesn't exist any faulty implementation and every implementation of the C++11 standard is correct.

示例:

// concurrent thread access to a stl container // g++ -std=gnu++11 -o p_read p_read.cpp -pthread -Wall -pedantic && ./p_read #include <iostream> #include <iomanip> #include <string> #include <unistd.h> #include <thread> #include <mutex> #include <map> #include <cstdlib> #include <ctime> using namespace std; // new in C++11 using str_map = map<string, string>; // thread is new in C++11 // to_string() is new in C++11 mutex m; const unsigned int MAP_SIZE = 10000; void fill_map(str_map& store) { int key_nr; string mapped_value; string key; while (store.size() < MAP_SIZE) { // 0 - 9999 key_nr = rand() % MAP_SIZE; // convert number to string mapped_value = to_string(key_nr); key = "key_" + mapped_value; pair<string, string> value(key, mapped_value); store.insert(value); } } void print_map(const str_map& store) { str_map::const_iterator it = store.begin(); while (it != store.end()) { pair<string, string> value = *it; cout << left << setw(10) << value.first << right << setw(5) << value.second << " "; it++; } } void search_map(const str_map& store, int thread_nr) { m.lock(); cout << "thread(" << thread_nr << ") launched "; m.unlock(); // use a straight search or poke around random bool straight = false; if ((thread_nr % 2) == 0) { straight = true; } int key_nr; string mapped_value; string key; str_map::const_iterator it; string first; string second; for (unsigned int i = 0; i < MAP_SIZE; i++) { if (straight) { key_nr = i; } else { // 0 - 9999, rand is not thread-safe, nrand48 is an alternative m.lock(); key_nr = rand() % MAP_SIZE; m.unlock(); } // convert number to string mapped_value = to_string(key_nr); key = "key_" + mapped_value; it = store.find(key); // check result if (it != store.end()) { // pair first = it->first; second = it->second; // m.lock(); // cout << "thread(" << thread_nr << ") " << key << ": " // << right << setw(10) << first << setw(5) << second << " "; // m.unlock(); // check mismatch if (key != first || mapped_value != second) { m.lock(); cerr << key << ": " << first << second << " " << "Mismatch in thread(" << thread_nr << ")! "; exit(1); // never reached m.unlock(); } } else { m.lock(); cerr << "Warning: key(" << key << ") not found in thread(" << thread_nr << ") "; exit(1); // never reached m.unlock(); } } } int main() { clock_t start, end; start = clock(); str_map store; srand(0); fill_map(store); cout << "fill_map finished "; // print_map(store); // cout << "print_map finished "; // copy for check str_map copy_store = store; // launch threads thread t[10]; for (int i = 0; i < 10; i++) { t[i] = thread(search_map, store, i); } // wait for finish for (int i = 0; i < 10; i++) { t[i].join(); } cout << "search_map threads finished "; if (store == copy_store) { cout << "equal "; } else { cout << "not equal "; } end = clock(); cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << " "; cout << "CPU-TIME START " << start << " "; cout << "CPU-TIME END " << end << " "; cout << "CPU-TIME END - START " << end - start << " "; cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << " "; return 0; }

这段代码可以用 GCC 4.7 编译并在我的机器上运行良好.

This code can be compiled with GCC 4.7 and runs fine on my machine.

$ 回声 $?
$ 0

$ echo $?
$ 0

推荐答案

来自 C++11 规范第 1.10/4 和 1.10/21 节的数据竞争，需要至少两个具有非原子访问权限的线程同一组内存位置，两个线程在访问该组内存位置方面不同步，并且至少一个线程写入或修改该组内存位置中的一个元素.所以在你的情况下，如果线程只是读取，你很好......根据定义，因为没有线程写入同一组内存位置，即使没有明确的同步机制，也没有数据竞争线程.

A data-race, from the C++11 specification in sections 1.10/4 and 1.10/21, requires at least two threads with non-atomic access to the same set of memory locations, the two threads are not synchronized with regards to accessing the set of memory locations, and at least one thread writes to or modifies an element in the set of memory locations. So in your case, if the threads are only reading, you are fine ... by definition since none of the threads write to the same set of memory locations, there are no data-races even though there is no explicit synchronization mechanism between the threads.

相关文章