tr1::unordered_set 联合和交集

2022-01-17 00:00:00 set c++ tr1

如何在 c++ 中对 tr1::unordered_set 类型的集合进行交集和并集?我找不到太多关于它的参考.

How to do intersection and union for sets of the type tr1::unordered_set in c++? I can't find much reference about it.

任何参考和代码都将受到高度赞赏.非常感谢.

Any reference and code will be highly appreciated. Thank you very much.

更新:我只是猜想 tr1::unordered_set 应该提供交集、并集、差集的功能.因为这是集合的基本操作.当然我可以自己写一个函数,但我只是想知道是否有来自 tr1 的内置函数.非常感谢.

Update: I just guessed the tr1::unordered_set should provide the function for intersection, union, difference.. Since that's the basic operation of sets. Of course I can write a function by myself, but I just wonder if there are built in function from tr1. Thank you very much.

推荐答案

我看到 set_intersection() 等.algorithm 标头中的内容不起作用,因为它们明确要求对输入进行排序――猜想你已经排除了它们.

I see that set_intersection() et al. from the algorithm header won't work as they explicitly require their inputs to be sorted -- guess you ruled them out already.

在我看来,遍历哈希 A 并查找哈希 B 中的每个元素的幼稚"方法实际上应该为您提供接近最佳的性能,因为哈希 B 中的连续查找将转到同一个哈希桶 (假设两个哈希都使用相同的哈希函数).即使这些存储桶几乎可以肯定是作为链表实现的,这也应该会给您提供不错的内存局部性.

It occurs to me that the "naive" approach of iterating through hash A and looking up every element in hash B should actually give you near-optimal performance, since successive lookups in hash B will be going to the same hash bucket (assuming that both hashes are using the same hash function). That should give you decent memory locality, even though these buckets are almost certainly implemented as linked lists.

以下是 unordered_set_difference() 的一些代码,您可以对其进行调整以制作 set union 和 set difference 的版本:

Here's some code for unordered_set_difference(), you can tweak it to make the versions for set union and set difference:

template <typename InIt1, typename InIt2, typename OutIt>
OutIt unordered_set_intersection(InIt1 b1, InIt1 e1, InIt2 b2, InIt2 e2, OutIt out) {
    while (!(b1 == e1)) {
        if (!(std::find(b2, e2, *b1) == e2)) {
            *out = *b1;
            ++out;
        }

        ++b1;
    }

    return out;
}

假设您有两个 unordered_setxy,您可以使用:

Assuming you have two unordered_sets, x and y, you can put their intersection in z using:

unordered_set_intersection(
    x.begin(), x.end(),
    y.begin(), y.end(),
    inserter(z, z.begin())
);

与 bdonlan 的回答不同,这确实有效对于任何键类型和容器类型的任何组合(尽管如果源容器已排序,使用 set_intersection() 当然会更快).

Unlike bdonlan's answer, this will actually work for any key types, and any combination of container types (although using set_intersection() will of course be faster if the source containers are sorted).

注意:如果存储桶占用率很高,将每个散列复制到 vector 中,对它们进行排序并在那里 set_intersection() 它们可能会更快,因为在存储桶中搜索包含 n 个元素是 O(n).

NOTE: If bucket occupancies are high, it's probably faster to copy each hash into a vector, sort them and set_intersection() them there, since searching within a bucket containing n elements is O(n).

相关文章