减少 OpenMP 中的数组

2021-12-06 00:00:00 multithreading parallel-processing openmp c++ reduction

我正在尝试并行化以下程序，但不知道如何减少数组.我知道这是不可能的，但有没有其他选择?谢谢.(我在 m 上添加了reduce，这是错误的，但想就如何做到这一点提出建议.)

I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.)

#include <iostream> #include <stdio.h> #include <time.h> #include <omp.h> using namespace std; int main () { int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10]; time_t start_time = time(NULL); #pragma omp parallel for private(m) reduction(+:m) for (int n=0 ; n<10 ; ++n ){ for (int m=0; m<=n; ++m){ S[n] += A[m]; } } time_t end_time = time(NULL); cout << end_time-start_time; return 0; }

推荐答案

是的，可以使用 OpenMP 进行数组缩减.在 Fortran 中，它甚至为此有构造.在 C/C++ 中，你必须自己做.这里有两种方法可以做到.

Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.

第一种方法为每个线程制作私有版本的S，并行填充，然后在临界区合并成S(见下面的代码).第二种方法创建一个维度为 10*nthreads 的数组.并行填充此数组，然后将其合并到 S 中，而不使用临界区.第二种方法要复杂得多，如果您不小心，可能会出现缓存问题，尤其是在多路系统上.有关更多详细信息，请参阅此填充直方图(数组缩减)与 OpenMP 并行，无需使用临界区

The first method makes private version of S for each thread, fill them in parallel, and then merges them into S in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section

第一种方法

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; #pragma omp parallel { int S_private[10] = {0}; #pragma omp for for (int n=0 ; n<10 ; ++n ) { for (int m=0; m<=n; ++m){ S_private[n] += A[m]; } } #pragma omp critical { for(int n=0; n<10; ++n) { S[n] += S_private[n]; } } }

第二种方法

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; int *S_private; #pragma omp parallel { const int nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); #pragma omp single { S_private = new int[10*nthreads]; for(int i=0; i<(10*nthreads); i++) S_private[i] = 0; } #pragma omp for for (int n=0 ; n<10 ; ++n ) { for (int m=0; m<=n; ++m){ S_private[ithread*10+n] += A[m]; } } #pragma omp for for(int i=0; i<10; i++) { for(int t=0; t<nthreads; t++) { S[i] += S_private[10*t + i]; } } } delete[] S_private;

相关文章