确定无序向量 T 是否存在拥有所有独特的元素

2022-01-07 00:00:00 algorithm unique c++ stl

分析我的 cpu 绑定代码表明我花了很长时间检查容器是否包含完全唯一的元素.假设我有一些未排序元素的大容器(定义了 <=),我有两个关于如何做到这一点的想法:

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:

第一个使用集合:

template <class T>
bool is_unique(vector<T> X) {
  set<T> Y(X.begin(), X.end());
  return X.size() == Y.size();
}

第二次遍历元素:

template <class T>
bool is_unique2(vector<T> X) {
  typename vector<T>::iterator i,j;
  for(i=X.begin();i!=X.end();++i) {
    for(j=i+1;j!=X.end();++j) {
      if(*i == *j) return 0;
    }
  }
  return 1;
}

我已经尽我所能对它们进行了测试,根据我从阅读有关 STL 的文档中收集到的信息,答案是(像往常一样),这取决于.我认为在第一种情况下,如果所有元素都是唯一的,它会很快,但是如果存在很大的退化,操作似乎需要 O(N^2) 时间.对于嵌套迭代器方法,情况似乎正好相反,如果 X[0]==X[1],它会快速点亮,但如果所有元素都需要(可以理解)O(N^2) 时间是独一无二的.

I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.

有没有更好的方法可以做到这一点,也许是为此目的而构建的 STL 算法?如果没有,是否有任何建议可以提高效率?

Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?

推荐答案

你的第一个例子应该是 O(N log N) 因为 set 每次插入需要 log N 时间.我认为不可能有更快的 O.

Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.

第二个例子显然是O(N^2).系数和内存使用率低,因此在某些情况下可能会更快(甚至最快).

The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.

这取决于 T 是什么,但为了通用性能,我建议对指向对象的指针向量进行排序.

It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.

template< class T >
bool dereference_less( T const *l, T const *r )
 { return *l < *r; } 

template <class T>
bool is_unique(vector<T> const &x) {
    vector< T const * > vp;
    vp.reserve( x.size() );
    for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
    sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
    return adjacent_find( vp.begin(), vp.end(),
           not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
        == vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}

或在 STL 风格中,

or in STL style,

template <class I>
bool is_unique(I first, I last) {
    typedef typename iterator_traits<I>::value_type T;
    …

<小时>

当然,如果你可以对原始向量重新排序,


And if you can reorder the original vector, of course,

template <class T>
bool is_unique(vector<T> &x) {
    sort( x.begin(), x.end() ); // O(N log N)
    return adjacent_find( x.begin(), x.end() ) == x.end();
}

相关文章