C++ STL unordered_map 如何解决冲突?

2022-01-07 00:00:00 c++ stl unordered-map

How does C++ STL unordered_map resolve collisions?

查看http://www.cplusplus.com/reference/unordered_map/unordered_map/，它说唯一键容器中的任何两个元素都不能具有相同的键."

Looking at the http://www.cplusplus.com/reference/unordered_map/unordered_map/, it says "Unique keys No two elements in the container can have equivalent keys."

这应该意味着容器确实正在解决冲突.但是，该页面并没有告诉我它是如何做的.我知道一些解决冲突的方法，比如使用链表和/或探测.我想知道的是c++ STL unordered_map 是如何解析的.

That should mean that the container is indeed resolving collisions. However, that page does not tell me how it is doing it. I know some ways to resolve collisions like using linked lists and/or probing. What I want to know is how the c++ STL unordered_map is resolving it.

推荐答案

该标准对此的定义比大多数人似乎意识到的要多一些.

The standard defines a little more about this than most people seem to realize.

具体来说，标准要求(第 23.2.5/9 节):

Specifically, the standard requires (§23.2.5/9):

无序关联容器的元素被组织成桶.具有相同哈希码的键出现在同一个桶中.

The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket.

该接口包含一个以恒定时间运行的 bucket_count.(表103).它还包括一个 bucket_size，它必须在时间上与存储桶的大小成线性关系.

The interface includes a bucket_count that runs in constant time. (table 103). It also includes a bucket_size that has to run in time linear on the size of the bucket.

这基本上是在描述一个使用碰撞链的实现.当您确实使用碰撞链时，满足所有要求介于简单和琐碎之间.bucket_count() 是数组中元素的数量.bucket_size() 是碰撞链中元素的数量.分别在常数和线性时间内获取它们既简单又直接.

That's basically describing an implementation that uses collision chaining. When you do use collision chaining, meeting all the requirements is somewhere between easy and trivial. bucket_count() is the number of elements in your array. bucket_size() is the number of elements in the collision chain. Getting them in constant and linear time respectively is simple and straightforward.

相比之下，如果您使用诸如线性探测或双哈希之类的方法，这些要求几乎不可能满足.具体来说，所有散列到特定值的项目都需要落入同一个桶中，并且您需要能够在恒定时间内对这些桶进行计数.

By contrast, if you use something like linear probing or double hashing, those requirements become all but impossible to meet. Specifically, all the items that hashed to a specific value need to land in the same bucket, and you need to be able to count those buckets in constant time.

但是，如果您使用诸如线性探测或双重散列之类的方法，找到散列为相同值的所有项目意味着您需要对值进行散列，然后遍历表中非空项目的链"以找出其中有多少散列到相同的值.不过，这与散列为相同值的项目数量不是线性的――它与散列为相同或冲突值的项目数量呈线性关系.

But, if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.

有了足够多的额外工作和相当数量的将某些要求的含义延伸到几乎崩溃的程度，使用碰撞链以外的其他东西创建哈希表可能几乎不可能，并且至少仍然可以满足要求――但我不确定这是否可行，而且肯定会涉及很多额外的工作.

With enough extra work and a fair amount of stretching the meaning of some of the requirements almost to the breaking point, it might be barely possible to create a hash table using something other than collision chaining, and still at least sort of meet the requirements--but I'm not really certain it's possible, and it would certain involve quite a lot of extra work.

总结:std::unordered_set(或unordered_map)的所有实际实现无疑都使用了碰撞链.虽然使用线性探测或双散列可能(只是勉强)满足要求，但这样的实现似乎损失了很多，几乎没有任何回报.

Summary: all practical implementations of std::unordered_set (or unordered_map) undoubtedly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

相关文章