为什么 Python dict 可以有多个具有相同哈希的键?
问题描述
我试图了解 Python hash
函数的底层.我创建了一个自定义类,其中所有实例都返回相同的哈希值.
I am trying to understand the Python hash
function under the hood. I created a custom class where all instances return the same hash value.
class C:
def __hash__(self):
return 42
我只是假设上述类的一个实例在任何时候都可以在一个dict
中,但实际上一个dict
可以有多个具有相同hash的元素.
I just assumed that only one instance of the above class can be in a dict
at any time, but in fact a dict
can have multiple elements with the same hash.
c, d = C(), C()
x = {c: 'c', d: 'd'}
print(x)
# {<__main__.C object at 0x7f0824087b80>: 'c', <__main__.C object at 0x7f0823ae2d60>: 'd'}
# note that the dict has 2 elements
我做了更多的实验,发现如果我重写 __eq__
方法使得类的所有实例比较相等,那么 dict
只允许一个实例.
I experimented a little more and found that if I override the __eq__
method such that all the instances of the class compare equal, then the dict
only allows one instance.
class D:
def __hash__(self):
return 42
def __eq__(self, other):
return True
p, q = D(), D()
y = {p: 'p', q: 'q'}
print(y)
# {<__main__.D object at 0x7f0823a9af40>: 'q'}
# note that the dict only has 1 element
所以我很想知道 dict
如何拥有多个具有相同哈希的元素.
So I am curious to know how a dict
can have multiple elements with the same hash.
解决方案
有关 Python 哈希如何工作的详细描述,请参阅我对 Why is早退比别的慢?
For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?
基本上,它使用哈希来选择表中的一个槽.如果槽中有一个值并且哈希匹配,它会比较项目以查看它们是否相等.
Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.
如果哈希匹配但项目不相等,那么它会尝试另一个插槽.有一个公式可以选择这个(我在参考答案中描述过),它会逐渐引入哈希值的未使用部分;但是一旦它用完它们,它最终会通过哈希表中的所有槽.这保证最终我们要么找到匹配的项目,要么找到一个空槽.当搜索找到一个空槽时,它会插入值或放弃(取决于我们是添加还是获取值).
If the hash matches but the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).
需要注意的重要一点是没有列表或桶:只有一个具有特定槽数的哈希表,每个哈希用于生成候选槽序列.
The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.
相关文章