使用 Thrust 按键组合两个列表
给定两个键值列表,我试图通过匹配键并在键匹配时将函数应用于两个值来组合两侧.就我而言,我想将值相乘.一个更清楚的小例子:
Given two key-value lists, I am trying to combine the two sides by matching the keys and applying a function to the two values when the keys match. In my case I want to multiply the values. A small example to make it more clear:
Left keys: { 1, 2, 4, 5, 6 }
Left values: { 3, 4, 1, 2, 1 }
Right keys: { 1, 3, 4, 5, 6, 7 };
Right values: { 2, 1, 1, 4, 1, 2 };
Expected output keys: { 1, 4, 5, 6 }
Expected output values: { 6, 1, 8, 1 }
我已经能够使用 C++ 使用下一个代码在 CPU 上实现这一点:
I have been able to implement this on the CPU using C++ using the next code:
int main() {
int leftKeys[5] = { 1, 2, 4, 5, 6 };
int leftValues[5] = { 3, 4, 1, 2, 1 };
int rightKeys[6] = { 1, 3, 4, 5, 6, 7 };
int rightValues[6] = { 2, 1, 1, 4, 1, 2 };
int leftIndex = 0, rightIndex = 0;
std::vector<std::tuple<int, int>> result;
while (leftIndex < 5 && rightIndex < 6) {
if (leftKeys[leftIndex] < rightKeys[rightIndex]) {
leftIndex++;
}
if (leftKeys[leftIndex] > rightKeys[rightIndex]) {
rightIndex++;
}
result.push_back(std::make_tuple(leftKeys[leftIndex], leftValues[leftIndex] * rightValues[rightIndex]));
leftIndex++;
rightIndex++;
}
// Print results
for (int i = 0; i < result.size(); i++) {
std::cout << "Key: " << std::get<0>(result[i]) << "; Value: " << std::get<1>(result[i]) << "
";
}
}
但是,我在 Thrust 的 device_vector
中有输入键和值,我还需要 GPU 上的结果.因此,如果我不需要将所有输入复制到主机并将所有输出复制回设备,它会更有效.
However, I have the input keys and values in Thrust's device_vector
s and I need the results on the GPU as well. Therefore it would be more efficient if I did not need to copy all inputs to the host and all outputs back to the device.
问题是我找不到可用于使用一组键组合两个列表的推力函数(并将函数应用于两个值).是否存在这样的功能,或者是否有一种简单的方法可以自己实现?我应该在主机上执行此操作吗?
The problem is that I cannot find a Thrust function that can be used to combine two lists using a set of keys (and apply a function to both values). Does such a function exist or is there an easy way to implement it myself of should I just do this on the host?
更新:
可以对输入做出以下假设:
The following assumptions can be made about the input:
- 键总是排序的.
- 单个列表中不存在重复键(列表之间当然存在重复键,否则结果将为空).
更新 2:
在@Robert 的回答中实施第二种方法时,我陷入了转型.到目前为止,我的代码如下:
While implementing the second approach in @Robert's answer I get stuck at the transformation. My code so far is below:
struct multiply_transformation : public thrust::binary_function<std::tuple<int, int>, std::tuple<int, int>, std::tuple<int, int>>
{
__host__ __device__
thrust::tuple<int, int> operator()(thrust::tuple<int, int> d_left, thrust::tuple<int, int> d_right)
{
if (thrust::get<0>(d_left) == thrust::get<0>(d_right)) {
return thrust::make_tuple(thrust::get<0>(d_left), thrust::get<1>(d_left) * thrust::get<1>(d_right));
}
return thrust::make_tuple(-1, -1);
}
};
thrust::device_vector<int> d_mergedKeys(h_leftCount + h_rightCount);
thrust::device_vector<int> d_mergedValues(h_leftCount + h_rightCount);
thrust::merge_by_key(d_leftCountKeys.begin(), d_leftCountKeys.begin() + h_leftCount,
d_rightCountKeys.begin(), d_rightCountKeys.begin() + h_rightCount,
d_leftCounts.begin(), d_rightCounts.begin(), d_mergedKeys.begin(), d_mergedValues.begin());
typedef thrust::tuple<int, int> IntTuple;
thrust::zip_iterator<IntTuple> d_zippedCounts(thrust::make_tuple(d_mergedKeys.begin(), d_mergedValues.begin()));
thrust::zip_iterator<IntTuple> d_zippedCountsOffset(d_zippedCounts + 1);
multiply_transformation transformOperator;
thrust::device_vector<IntTuple> d_transformedResult(h_leftCount + h_rightCount);
thrust::transform(d_zippedCounts, d_zippedCounts + h_leftCount + h_rightCount - 1, d_zippedCountsOffset, d_transformedResult.begin(), transformOperator);
但是,我得到的错误是没有重载函数 thrust::transform
与参数列表匹配.在上面的代码中,h_leftCount
和 h_rightCount
是左右输入的大小.d_leftCountKeys
、d_rightCountKeys
、d_leftCounts
和d_rightCounts
是thrust::device_vector
.
However, I get the error that no overloaded function thrust::transform
matches the argument list. In the above code h_leftCount
and h_rightCount
are the sizes of the left and right inputs. d_leftCountKeys
, d_rightCountKeys
, d_leftCounts
, and d_rightCounts
are thrust::device_vector<int>
.
推荐答案
好吧,我不确定这是最好的方法(@ms 通常会提出比我更好的方法),但一种可能的方法是(方法1):
Well, I'm not sure this is the best method (@m.s. usually comes up with better approaches than I), but one possible approach would be (method 1):
- set_intersection_by_key(左,右)
- set_intersection_by_key(右,左)
- 获取第 1 步和第 2 步的输出值,并执行 transform 在它们上将值结果相乘(或您想对第 1 步和第 2 步的相应值结果执行的任何其他数学运算).
- set_intersection_by_key(Left,Right)
- set_intersection_by_key(Right,Left)
- Take the values outputs from step 1 and step 2, and perform a transform on them to multiply the values results together (or whatever other mathematical operation you'd like to perform on the corresponding values results from step 1 and step 2).
我不知道你对 thrust 的技能水平如何但如果需要,我可以提供上述的一个简单的工作示例.
I don't know what your skill level is with thrust but I can provide a trivial worked example of the above if desired.
另一种可能的方法(方法2):
Another possible approach (method 2):
- merge_by_key 将两个列表合并在一起
- 使用来自步骤 1 的结果列表的两个版本执行转换:第一个由 [first, last-1) 组成,第二个由 [first+1, last) 组成.这将需要一个特殊的函子,它采用压缩版本的键和值,并比较呈现的两个键.如果匹配,则在两个呈现值上输出所需的数学运算;如果不匹配,则输出某种标记或已知的非法值.(如果无法构造这样的非法值,我们可以根据需要压缩第三个标记数组.)
- 对第 2 步的输出执行 remove_if,将结果压缩为所需的结果,删除所有非法的值条目,或者删除标记数组指示的所有值条目.
- merge_by_key the two lists together
- Perform a transform using two versions of the resultant list from step 1: The first consisting of [first, last-1) and the second consisting of [first+1, last). This will require a special functor that takes a zipped version of the keys and values, and compares the two keys presented. If it is a match, output the desired mathematical operation on the two presented values; if it is not a match, output some kind of marker or known illegal value. (If such an illegal value is impossible to construct, we can zip a 3rd marker array in if needed.)
- Do a remove_if on the output of step 2, to compact the result down to the desired result, removing all value entries that are illegal, or else removing all value entries that are indicated by the marker array.
我的感觉是第二种方法可能更快,但我没有仔细考虑过.无论如何,最好对测试用例进行基准测试,而不是根据(我的)直觉来工作.
My sense is the second method might be faster, but I haven't carefully thought through it. In any event, it's better to benchmark test cases than to work off of (my) intuition.
根据下面的评论,这里使用您的示例数据集描述了从方法 2 的第二步开始发生的情况:
Based on a comment below, here is a description of what is happening starting with the 2nd step of method 2, using your example dataset:
第 1 步(merge_by_key 操作)的输出如下所示:
The output of step 1 (the merge_by_key operation) would look like something like this:
keys: { 1, 1, 2, 3, 4, 4, 5, 5, 6, 6, 7 };
values: { 3, 2, 4, 1, 1, 1, 2, 4, 1, 1, 2 };
让我们构造两个版本,第一个是该项目",第二个是右边的下一个项目":
Let's construct two versions, the first being "the item" and the second being "the next item to the right":
keys1: { 1, 1, 2, 3, 4, 4, 5, 5, 6, 6 };
values1: { 3, 2, 4, 1, 1, 1, 2, 4, 1, 1 };
keys2: { 1, 2, 3, 4, 4, 5, 5, 6, 6, 7 };
values2: { 2, 4, 1, 1, 1, 2, 4, 1, 1, 2 };
实际的构造"是微不足道的.keys1 只是 [keys.begin(), keys.end()-1),keys2 只是 [keys.begin()+1, keys.end()).对于 values1 和 values2 也是如此.
The actual "construction" is trivial. keys1 is just [keys.begin(), keys.end()-1), and keys2 is just [keys.begin()+1, keys.end()). And likewise for values1 and values2.
我们会将keys1 和values1 压缩在一起,并将keys2 和values2 压缩在一起.然后我们将这两个压缩实体传递给具有特殊函子的转换操作,该函子将执行以下操作:
We'll zip keys1 and values1 together and we'll zip keys2 and values2 together. Then we'll pass these two zipped entities to a transform operation that has a special functor that will do the following:
如果 keys1 == keys2,对 values1 和 values2 数量进行所需的数学运算,并在标记数组中放置一个 1.如果不是,则在标记数组中放置一个 0.此操作的输出将是:
If keys1 == keys2, do the desired math operation on the values1 and values2 quantities, and place a one in the marker array. If not, place a 0 in a marker array. The output of this operation would be:
keys: { 1, 2, 3, 4, 4, 5, 5, 6, 6, 7 };
values: { 6, 4, 1, 1, 1, 8, 4, 1, 1, 2 };
marker: { 1, 0, 0, 1, 0, 1, 0, 1, 0, 0 };
现在将上面的 3 个向量压缩在一起,并将其传递给 remove_if.remove_if 函子将指示删除标记 == 0 的任何项目,离开:
Now zip the 3 vectors above together, and pass that to remove_if. The remove_if functor would indicate removal of any items for which marker == 0, leaving:
keys: { 1, 4, 5, 6 };
values: { 6, 1, 8, 1 };
marker: { 1, 1, 1, 1 };
这是一个完整的例子,展示了这两种方法:
Here is a fully worked example demonstrating both methods:
$ cat t1007.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/set_operations.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/copy.h>
#include <thrust/merge.h>
#include <thrust/remove.h>
#include <assert.h>
struct mark_mpy_func
{
template <typename T1, typename T2>
__host__ __device__
int operator()(T1 &z1, T2 &z2){
int res = 0;
if (thrust::get<0>(z1) == thrust::get<0>(z2)){
res = thrust::get<1>(z1) * thrust::get<1>(z2);
thrust::get<2>(z1) = 1;}
return res;
}
};
struct mtest_func
{
__host__ __device__
bool operator()(int t){
if (t == 0) return true;
return false;
}
};
int main(){
int Lkeys[] = { 1, 2, 4, 5, 6 };
int Lvals[] = { 3, 4, 1, 2, 1 };
int Rkeys[] = { 1, 3, 4, 5, 6, 7 };
int Rvals[] = { 2, 1, 1, 4, 1, 2 };
size_t Lsize = sizeof(Lkeys)/sizeof(int);
size_t Rsize = sizeof(Rkeys)/sizeof(int);
thrust::device_vector<int> Lkeysv(Lkeys, Lkeys+Lsize);
thrust::device_vector<int> Lvalsv(Lvals, Lvals+Lsize);
thrust::device_vector<int> Rkeysv(Rkeys, Rkeys+Rsize);
thrust::device_vector<int> Rvalsv(Rvals, Rvals+Rsize);
// method 1
thrust::device_vector<int> Lkeysvo(Lsize);
thrust::device_vector<int> Lvalsvo(Lsize);
thrust::device_vector<int> Rkeysvo(Rsize);
thrust::device_vector<int> Rvalsvo(Rsize);
size_t Lsizeo = thrust::set_intersection_by_key(Lkeysv.begin(), Lkeysv.end(), Rkeysv.begin(), Rkeysv.end(), Lvalsv.begin(), Lkeysvo.begin(), Lvalsvo.begin()).first - Lkeysvo.begin();
size_t Rsizeo = thrust::set_intersection_by_key(Rkeysv.begin(), Rkeysv.end(), Lkeysv.begin(), Lkeysv.end(), Rvalsv.begin(), Rkeysvo.begin(), Rvalsvo.begin()).first - Rkeysvo.begin();
assert(Lsizeo == Rsizeo);
thrust::device_vector<int> res1(Lsizeo);
thrust::transform(Lvalsvo.begin(), Lvalsvo.begin()+Lsizeo, Rvalsvo.begin(), res1.begin(), thrust::multiplies<int>());
std::cout << "Method 1 result:" << std::endl << "keys: ";
thrust::copy_n(Lkeysvo.begin(), Lsizeo, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl << "vals: ";
thrust::copy_n(res1.begin(), Lsizeo, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
// method 2
thrust::device_vector<int> Mkeysv(Lsize + Rsize);
thrust::device_vector<int> Mvalsv(Lsize + Rsize);
thrust::merge_by_key(Lkeysv.begin(), Lkeysv.end(), Rkeysv.begin(), Rkeysv.end(), Lvalsv.begin(), Rvalsv.begin(), Mkeysv.begin(), Mvalsv.begin());
thrust::device_vector<int> marker(Lsize + Rsize - 1);
thrust::device_vector<int> res2(Lsize + Rsize - 1);
thrust::transform(thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin(), Mvalsv.begin(), marker.begin())), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.end()-1, Mvalsv.end()-1, marker.end())), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin()+1, Mvalsv.begin()+1)), res2.begin(), mark_mpy_func());
size_t rsize2 = thrust::remove_if(thrust::make_zip_iterator(thrust::make_tuple( Mkeysv.begin(), res2.begin())), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.end()-1, res2.end())), marker.begin(), mtest_func()) - thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin(), res2.begin()));
std::cout << "Method 2 result:" << std::endl << "keys: ";
thrust::copy_n(Mkeysv.begin(), rsize2, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl << "vals: ";
thrust::copy_n(res2.begin(), rsize2, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
return 0;
}
$ nvcc -o t1007 t1007.cu
$ ./t1007
Method 1 result:
keys: 1,4,5,6,
vals: 6,1,8,1,
Method 2 result:
keys: 1,4,5,6,
vals: 6,1,8,1,
$
如果可以接受在输出数据中使用标记值(例如,-1)来通知 remove_if 操作,则可以省略单独的标记数组.以下是方法 2 的修改版本:
If it is acceptable to use a marker value (say, -1) in the output data to inform the remove_if operation, then the separate marker array can be dispensed with. Here's a modified version of method 2 that works this way:
$ cat t1007.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <thrust/merge.h>
#include <thrust/remove.h>
#define MARK_VAL -1
struct mark_mpy_func
{
template <typename T1, typename T2>
__host__ __device__
int operator()(T1 &z1, T2 &z2){
int res = MARK_VAL;
if (thrust::get<0>(z1) == thrust::get<0>(z2)){
res = thrust::get<1>(z1) * thrust::get<1>(z2);}
return res;
}
};
struct mtest_func
{
template <typename T>
__host__ __device__
bool operator()(T t){
if (thrust::get<1>(t) == MARK_VAL) return true;
return false;
}
};
int main(){
int Lkeys[] = { 1, 2, 4, 5, 6 };
int Lvals[] = { 3, 4, 1, 2, 1 };
int Rkeys[] = { 1, 3, 4, 5, 6, 7 };
int Rvals[] = { 2, 1, 1, 4, 1, 2 };
size_t Lsize = sizeof(Lkeys)/sizeof(int);
size_t Rsize = sizeof(Rkeys)/sizeof(int);
thrust::device_vector<int> Lkeysv(Lkeys, Lkeys+Lsize);
thrust::device_vector<int> Lvalsv(Lvals, Lvals+Lsize);
thrust::device_vector<int> Rkeysv(Rkeys, Rkeys+Rsize);
thrust::device_vector<int> Rvalsv(Rvals, Rvals+Rsize);
// method 2
thrust::device_vector<int> Mkeysv(Lsize + Rsize);
thrust::device_vector<int> Mvalsv(Lsize + Rsize);
thrust::merge_by_key(Lkeysv.begin(), Lkeysv.end(), Rkeysv.begin(), Rkeysv.end(), Lvalsv.begin(), Rvalsv.begin(), Mkeysv.begin(), Mvalsv.begin());
thrust::device_vector<int> res2(Lsize + Rsize - 1);
thrust::transform(thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin(), Mvalsv.begin())), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.end()-1, Mvalsv.end()-1)), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin()+1, Mvalsv.begin()+1)), res2.begin(), mark_mpy_func());
size_t rsize2 = thrust::remove_if(thrust::make_zip_iterator(thrust::make_tuple( Mkeysv.begin(), res2.begin())), thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.end()-1, res2.end())), mtest_func()) - thrust::make_zip_iterator(thrust::make_tuple(Mkeysv.begin(), res2.begin()));
std::cout << "Method 2 result:" << std::endl << "keys: ";
thrust::copy_n(Mkeysv.begin(), rsize2, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl << "vals: ";
thrust::copy_n(res2.begin(), rsize2, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
return 0;
}
$ nvcc -o t1007 t1007.cu
$ ./t1007
Method 2 result:
keys: 1,4,5,6,
vals: 6,1,8,1,
$
相关文章