如何在使用算法保持原始排序的同时从未排序的 std::vector 中删除重复项?

2022-01-10 00:00:00 duplicates c++ stdvector stl-algorithm stdset

我有一个整数数组，我需要从中删除重复项，同时保持每个整数第一次出现的顺序.我可以看到这样做，但想象有更好的方法可以更好地利用 STL 算法?插入超出了我的控制范围，因此我无法在插入之前检查重复项.

I have an array of integers that I need to remove duplicates from while maintaining the order of the first occurrence of each integer. I can see doing it like this, but imagine there is a better way that makes use of STL algorithms better? The insertion is out of my control, so I cannot check for duplicates before inserting.

int unsortedRemoveDuplicates(std::vector<int> &numbers) { std::set<int> uniqueNumbers; std::vector<int>::iterator allItr = numbers.begin(); std::vector<int>::iterator unique = allItr; std::vector<int>::iterator endItr = numbers.end(); for (; allItr != endItr; ++allItr) { const bool isUnique = uniqueNumbers.insert(*allItr).second; if (isUnique) { *unique = *allItr; ++unique; } } const int duplicates = endItr - unique; numbers.erase(unique, endItr); return duplicates; }

如何使用 STL 算法做到这一点?

How can this be done using STL algorithms?

推荐答案

naive 的方法是使用 std::set 就像每个人都告诉你的那样.它是矫枉过正并且缓存局部性很差(慢).
smart* 方法是适当地使用 std::vector (确保看到底部的脚注):

The naive way is to use std::set as everyone tells you. It's overkill and has poor cache locality (slow).
The smart* way is to use std::vector appropriately (make sure to see footnote at bottom):

#include <algorithm> #include <vector> struct target_less { template<class It> bool operator()(It const &a, It const &b) const { return *a < *b; } }; struct target_equal { template<class It> bool operator()(It const &a, It const &b) const { return *a == *b; } }; template<class It> It uniquify(It begin, It const end) { std::vector<It> v; v.reserve(static_cast<size_t>(std::distance(begin, end))); for (It i = begin; i != end; ++i) { v.push_back(i); } std::sort(v.begin(), v.end(), target_less()); v.erase(std::unique(v.begin(), v.end(), target_equal()), v.end()); std::sort(v.begin(), v.end()); size_t j = 0; for (It i = begin; i != end && j != v.size(); ++i) { if (i == v[j]) { using std::iter_swap; iter_swap(i, begin); ++j; ++begin; } } return begin; }

然后你可以像这样使用它:

Then you can use it like:

int main() { std::vector<int> v; v.push_back(6); v.push_back(5); v.push_back(5); v.push_back(8); v.push_back(5); v.push_back(8); v.erase(uniquify(v.begin(), v.end()), v.end()); }

*注意:这是在典型情况下的聪明方法，其中重复的数量不会太高.如需更全面的性能分析，请参阅this related answer to a related question.

*Note: That's the smart way in typical cases, where the number of duplicates isn't too high. For a more thorough performance analysis, see this related answer to a related question.

相关文章