std::vector 中的重复元素

2021-12-30 00:00:00 count vector c++

我有一个 std::vector 并且我想检查其中的所有元素.如果某个元素出现不止一次，我就会发出错误信号.

I have a std::vector and I want to check all the elements in it. If a certain element appears more than once, I signal an error.

我是这样做的:

std::vector<std::string> test; test.push_back("YES"); test.push_back("YES"); for(int i = 0; i < test.size(); i++) { if(test[i] > 1) { DCS_LOG_DEBUG("ERROR WITH COUNT") } }

虽然我知道如何使用 std::vector::count() 方法进行计数，但这不起作用.但我想获得每个元素的计数，而不是计算所有内容......有什么想法吗?

This did not work though I know how to count using the std::vector::count() method. But I want to get the count for each element, as opposed to counting everything... any ideas?

推荐答案

最简单的方法是对向量??进行std::sort，然后使用std::adjacent_find.

The simplest way is to std::sort the vector and then use std::adjacent_find.

但是，如果您不想对向量进行排序，则可以在 C++11 中执行以下操作:

However, if you don't want to sort the vector, you can do something like this in C++11:

#include <unordered_map> #include <functional> // For std::hash<std::string>. #include <string> #include <iostream> int main() { // Test data. std::vector<std::string> v; v.push_back("a"); v.push_back("b"); v.push_back("c"); v.push_back("a"); v.push_back("c"); v.push_back("d"); v.push_back("a"); // Hash function for the hashtable. auto h = [](const std::string* s) { return std::hash<std::string>()(*s); }; // Equality comparer for the hashtable. auto eq = [](const std::string* s1, const std::string* s2) { return s1->compare(*s2) == 0; }; // The hashtable: // Key: Pointer to element of 'v'. // Value: Occurrence count. std::unordered_map<const std::string*, size_t, decltype(h), decltype(eq)> m(v.size(), h, eq); // Count occurances. for (auto v_i = v.cbegin(); v_i != v.cend(); ++v_i) ++m[&(*v_i)]; // Print strings that occur more than once: for (auto m_i = m.begin(); m_i != m.end(); ++m_i) if (m_i->second > 1) std::cout << *m_i->first << ": " << m_i->second << std::endl; return 0; }

打印:

a: 3 c: 2

我实际上并没有对其进行基准测试，但由于以下原因，这有可能提高性能:

I didn't actually benchmark it, but this has a chance for being rather performant, for following reasons:

假设实际的向量元素不会产生病态的不平衡哈希，这实际上是一个 O(n) 算法，而不是 O(n*log(n)) 进行排序.
我们使用指针的哈希表指向字符串，而不是字符串本身，因此不会发生不必要的复制.
我们可以预分配"哈希表桶(我们在构造 m 时传递了 v.size())，因此最小化了哈希表调整大小.

Assuming the actual vector elements do not produce pathologically lopsided hashes, this is actually an O(n) algorithm, as opposed to O(n*log(n)) for sorting.

We are using the hashtable of pointers to strings, not strings themselves, so there is no unnecessary copying taking place.

We can "pre-allocate" hashtable buckets (we pass v.size() when constructing m), so hashtable resizes are minimized.

相关文章