C++ 中的 set 和 unordered_set 有什么区别?
我遇到了这个好问题,它很相似,但完全不同,因为它讨论了 Java,它具有不同的哈希表实现,因为它具有同步访问器/mutators:Java 中的 HashMap 和 Hashtable 有什么区别?
I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?
那么set
和unordered_set
的C++实现有什么区别?这个问题当然可以扩展到map
vs unordered_map
等其他C++容器.
So what is the difference in C++ implementation of set
and unordered_set
?
This question can be of course extended to map
vs unordered_map
and so on for other C++ containers.
这是我的初步评估:
set
:虽然标准没有明确要求将其实现为树,但要求其查找/插入操作的时间复杂度约束意味着它将始终作为树实现.通常作为高度平衡的 RB 树(如 GCC 4.8 中所见).由于它们是高度平衡的,它们对于 find()
set
: While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree.
Usually as RB tree (as seen in GCC 4.8), which is height-balanced.
Since they are height balanced, they have predictable time-complexity for find()
优点:紧凑(与其他 DS 相比)
Pros: Compact (compared to other DS in comparison)
缺点:访问时间复杂度为 O(lg n)
Con: Access time complexity is O(lg n)
unordered_set
:虽然标准没有明确要求将其实现为树,但要求其查找/插入操作的时间复杂度约束意味着它将始终作为哈希实现-表.
unordered_set
: While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table.
优点:
- 更快(承诺为搜索摊销 O(1))
- 与 tree-DS 相比,易于将基本原语转换为线程安全
缺点:
- 查找不保证是 O(1).理论上最坏的情况是 O(n).
- 不像树那么紧凑(实际上,负载因子永远不会是 1).
注意:哈希表的 O(1) 来自没有冲突的假设.即使负载因子为 0.5,每插入一秒变量都会导致冲突.可以看出,哈希表的负载因子与访问其中元素所需的操作数成反比.更多我们减少#operations,更稀疏的哈希表.当存储的元素大小与指针相当时,开销就相当可观.
Note: The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.
我是否错过了应该知道的性能分析地图/集合之间的任何区别?
推荐答案
我认为您通常已经回答了您自己的问题,但是,这个:
I think you've generally answered your own question, however, this:
不像树那么紧凑.(出于实际目的,负载因子永远不会是 1)
Not as compact as tree. (for practical purposes load factors is never 1)
不一定正确.T
类型的树的每个节点(我们假设它是红黑树)使用的空间至少等于 2 * pointer_size + sizeof(T) + sizeof(布尔)
.这可能是 3 * 指针大小
,具体取决于树是否包含每个树节点的 parent
指针.
is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T
utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool)
. This may be 3 * pointer size
depending on whether the tree contains a parent
pointer for each tree node.
将其与哈希映射进行比较:由于 加载因子 <的事实,每个哈希映射都会浪费数组空间.1
正如你所说的.然而,假设哈希映射使用单向链表进行链接(实际上,没有真正的理由不这样做),插入的每个元素仅采用 sizeof(T) + 指针大小
.
Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1
as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size
.
请注意,此分析忽略了可能来自对齐使用的额外空间的任何开销.
Note that this analysis ignores any overhead which may come from extra space used by alignment.
对于任何具有小尺寸的元素T
(因此,任何基本类型),指针的大小和其他开销占主导地位.在 > 的负载系数下;0.5
(例如)std::unordered_set
可能确实比等效的 std::set
使用更少的内存.
For any element T
which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5
(for example) the std::unordered_set
may indeed use up less memory than the equivalent std::set
.
另一个重要的缺失点是,基于给定的比较函数,遍历 std::set
保证产生从小到大的排序,同时遍历 std::set
>std::unordered_set 将以随机"顺序返回值.
The other big missing point is the fact that iterating through a std::set
is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set
will return the values in a "random" order.
相关文章